A New Soft Computing Method for K-Harmonic Means Clustering

The K-harmonic means clustering algorithm (KHM) is a new clustering method used to group data such that the sum of the harmonic averages of the distances between each entity and all cluster centroids is minimized. Because it is less sensitive to initialization than K-means (KM), many researchers have recently been attracted to studying KHM. In this study, the proposed iSSO-KHM is based on an improved simplified swarm optimization (iSSO) and integrates a variable neighborhood search (VNS) for KHM clustering. As evidence of the utility of the proposed iSSO-KHM, we present extensive computational results on eight benchmark problems. From the computational results, the comparison appears to support the superiority of the proposed iSSO-KHM over previously developed algorithms for all experiments in the literature.


Introduction
Clustering is perhaps the most well-known technique in data mining to cluster data based on certain criteria. In past decades, clustering has attracted much attention, and it is increasingly becoming an important tool due to its wide and valuable applications in improving data analysis in various fields, such as the natural sciences, psychology, medicine, engineering, economics, marketing and other fields .
Clustering is an NP-hard problem with computational effort growing exponentially with the problem size [1][2][3]. There are two categories among all existing clustering algorithms: hierarchical clustering and partition clustering [3]. The former builds a hierarchy tree of data that successively merges similar clusters, while the latter begins with a random partition and refines it iteratively [3].
Therefore, the K-harmonic means (KHM) algorithm was proposed by Zhang [7] in 1999 to solve the problem of sensitivity to initial starting points. However, it still may be trapped by convergence to a local optimum. Hence, the main focus of KHM research has shifted to develop soft computing, such as the tabu K-harmonic means [9], simulated annealing based KHM [10], the particle swarm optimization (PSO) KHM (PSO-KHM) [11], the hybrid data clustering algorithms based on ant colony optimization and KHM [12], a variable neighborhood search (VNS) for KHM clustering [10], the multi-start local search for KHM clustering (MLS) [13], the gravitational search algorithm based KHM [14], the candidate groups search combined with K-harmonic mean (CGS-KHM) [15], the simplified swarm optimization based KHM (SSO-KHM) [16], the statistical feature extraction modeling KHM [29], the PSO hybrid with tabu search for KHM clustering [30], the firefly [31] and the enhanced firefly algorithm [32] for KHM clustering, the fish school search algorithm [33], and the genetic hybrid with gravitational search for KHM clustering [34], to avoid the local trap problem and reduce numerical difficulties.
Soft computing is able to help the traditional KHM methods escape from the local optimum trap and obtain better results [7][8][9][10][11][12][13][14][15][16]. However, the update mechanisms of these soft computing methods are either too tedious, which then requires extra computational efforts, or too weak in their local search, which requires more time for convergence [16]. Thus, there is always a need to have a better soft computing method for KHM clustering.
In this paper, a new algorithm, iSSO-KHM, is proposed to help the KHM escape from local optima by installing a new update mechanism into the SSO and integrating the KHM. The rest of the paper is organized as follows: Section 2 provides a description of the KHM and an overview of SSO. The novel one-variable difference update mechanism and the survival of the fittest policy, which are two cores in the proposed iSSO-KHM, are introduced in Section 3. Section 4 compares the proposed iSSO-KHM with three recently introduced KHM-based algorithms in eight benchmark datasets adopted from the UCI database to demonstrate the performance of the proposed iSSO-KHM. Finally, concluding remarks are summarized in Section 5.

Overview of SSO and KHM
The proposed iSSO-KHM is based on both SSO and KHM. Before discussing the proposed iSSO-KHM, how to solve the KHM clustering, basic SSO and KHM algorithms is introduced formally in this section.
Let Nsol be the number of solutions that are initialized randomly, K be the number of variables and the number of centroids, c i = (c i,1 , c i,2 ,. . ., c i,K ) be the ith solution inside the problem space with a fitness value F(c i ) determined by the fitness function F to be optimized, pBest P i = (p i,1 , p i,2 ,. . .,p i,K ) be the best fitness function value of the ith solution with its own history, and gBest P gBest = (p gBest,1 , p gBest,2 ,. . .,p gBest,K ) be the solution with the best fitness function value among all pBests, where i = 1, 2, . . ., Nsol and gBest2{1, 2, . . ., Nsol}.
Analogous to all other soft computing techniques, SSO searches for optimal solutions by updating generations. In every generation of SSO, each variable c j,k is updated according to the following simple step function after C w , C p , and C g are given: where j = 1, 2, . . ., Nsol; k = 1, 2, . . ., K; C w , C p −C w , C g , and 1−C g are the predefined probabilities to determine whether c j,k will be updated to the same value (i.e., no change); p j,k in its pBest, p gBest,k of gBest, and regenerated to a new randomly generated feasible value [16][17][18][19][20][21][22][23][24][25][26][27][28].
Moving toward pBest is a local search; moving toward gBest is a global search. Moving toward a randomly generated feasible value is also a global search to maintain population diversity and enhance the capacity of escaping from a local optimum. Thus, each solution is a compromise among the current solution, pBest, gBest, and a random movement; this process combines local search and global search, yielding high search efficiency [16][17][18][19][20][21][22][23][24][25][26][27][28].

The KHM
KHM is similar to KM [7][8][9][10][11][12][13][14][15][16]. It is also a center-based partition clustering and randomly selects K initial centroids in the beginning. The major difference between KHM and KM is that KHM uses harmonic averages of the distances from each data point to the centers as components of its performance function. The detail of the KHM clustering algorithm is shown as follows [7][8][9][10][11][12][13][14][15][16]: KHM PROCEDURE. STEP K1. Select K initial centroids c 1 , c 2 , . . ., c K randomly, where c k is the centroid of the kth cluster; let F Ã be a large number, and provide a tolerance ε. STEP K2. Calculate fitness function: where p is the p th power of the Manhattan distance.
, then halt and go to STEP K7; else, let F STEP K4. Calculate the membership of each data X i to centroids c k for i = 1, 2, . . ., N and k = 1, 2, . . ., K as below: STEP K5. Calculate the weight of each data X i for i = 1, 2, . . ., N as below: STEP K6. Calculate the new centroid c k for k = 1, 2, . . ., K as below and go to STEP K2: STEP K2 calculates the fitness function F(c 1 , c 2 , . . ., c K ) of KHM by summing up all harmonic averages of the distances between each data point and all centroids. STEP K3 defines the stopping criteria for KHM. In STEP K4, KHM employs each member function M(c k , X i ) to measure the influence over the centroid c k to data X i . This member function determines which cluster each data point belongs to in STEP K7. STEP K5 assigns dynamic weight W(X i ) to each data point such that the larger the weight is, the smaller the distance is to any centroid to avoid multiple centroids close together. STEP K6 updates the current centroids.

The Proposed iSSO-KHM
Based on the novel one-variable difference update mechanism and the policy of survival of the fittest, the proposed iSSO-KHM is able to find a good solution without needing to explore all possible combinations of solutions. These two parts, i.e., the novel one-variable difference update mechanism and the policy of survival of the fittest, are discussed in this section.

The one-variable difference update mechanism
Each soft computing method has its own generic update mechanism and numerous revised update mechanisms for different applications in various situations. In most soft computing, the update mechanism is only changed slightly. For example, the update mechanism of PSO is considered to be a vector-based update mechanism using the following two equations where c 1 and c 2 are two constants: Note that all variables in the same solution share two random variables in PSO, i.e., ρ 1 and ρ 2 which are generated randomly from a uniform distribution within [0, 1] in Eq 6. In ABC, one variable for each solution is selected randomly for updating. The updated operators in traditional GA are either two variables via one-cut-point mutation, or up to half the number of variables changed via one-cut-point crossover. In the traditional SSO, however, all variables are updated simultaneously based on Eq 1.
To reduce the number of random values and to change solutions gradually without breaking the trend and stability in the convergent status, only one variable is updated in each solution for each iteration in the proposed iSSO-KHM. Another reason to adapted the onevariable update mechanism is due to the specific factor that the KHM is essentially insensitive to the initial conditions and only needs to refine its solution [7][8][9][10][11][12][13][14][15][16].
The update mechanism listed in Eq 1 is more suitable for this discrete data or type, and each variable of centroids is a floating point value in the KHM. Hence, the step function in Eq 1 is also revised for floating-point data in the novel one-variable difference update mechanism for the proposed iSSO-KHM as follows: where ρ 1 , ρ 2 , and ρ c are random numbers generated from the uniform distribution within [0,1]. Note that C g = . 4 and C w = .6 in this study, the role of pBest is removed, and the comparison order is C g first and then C w in the step function of Eq 6, which is different from Eq 1. For example, let c 3 = (1.3, 4.5, 6.7, 8.9) be the current solution, c gBest = c 6 = (2.7, 7.6, 5.4, 9.8) be the gBest, c x = c 5 = (2.3, 5.5, 7.7, 9.9) and c y = c 7 = (6.2, 8.5, 1.7, 4.9) be two randomly selected solutions, and the third variable (i.e., c 3,3 ) be selected randomly to update. Assume that ρ 1 = 0.3 and ρ 2 = 0.6 are generated randomly. Table 1 shows the newly updated c 3 for three different cases resulting from three different values of ρ c :
Unlike SSO, the proposed one-variable difference update mechanism only updates one variable and places more emphasis on the local search. Additionally, KHM is less sensitive to the

The complete pseudocode of the proposed iSSO-KHM
Like the existing related KHM algorithms, the KHM procedure discussed in section 2.2 to calculate the fitness of each solution is implemented in the iSSO-KHM and acts as a local search to further improve each updated solution heuristically. The steps of complete pseudocode of the proposed iSSO-KHM are described as follows. iSSO-KHM PROCEDURE.

STEP 2.
Let j = 1.  STEP 11. If the runtime is less than the predefined T, then go to STEP 2; otherwise, c gBest is the final solution, and halt.
In the above, STEP 0 simply runs the KHM procedure for each randomly generated solution to calculate its fitness function and update the solution. STEP 1 finds the first gBest from these initial populations after using the KHM procedure. STEPs 2-12 implement the proposed onevariable difference update mechanism; STEPs 9 and 10 are based on the survival-of-the-fittest policy to decide whether to accept the updated solution or replace gBest. Note that the stopping criterion in STEP 11 is the runtime T, and T = 0.1, 0.3, and 0.5 CPU seconds in the experiments tested in Section 4.

Experimental Results
In this section, we present the computational results of the comparisons among the proposed algorithm and existing algorithms on eight benchmark datasets to test the performance of iSSO-KHM.

The Experimental Setting
To evaluate the efficiency and effectiveness (i.e., the solution quality) of the proposed iSSO-KHM, eight benchmarks adopted from UCI are tested: Abalone (denoted by A, 4177 Moreover, iSSO-KHM is compared to four KHM-related soft computing algorithms: CGS_KHM, MLS_KHM, PSO_KHM, and SSO_KHM. Note that CGS_KHM has better performance than tabu search and VNS for the Iris, Glass and Wine datasets. The programming language used was C++ with default options for all five algorithms: CGS_KHM (denoted by CGS), iSSO-KHM (denoted by iSSO), MLS_KHM (denoted by MLS), PSO_KHM (denoted by MLS), and SSO_KHM (denoted by SSO). All codes were run using a 64-bit Window 10 Operating System with Intel Core i7-5960X 3.00 GHz CPU and 16 GB of RAM.
In experiments, all values of K are set to three; the p th power of the Manhattan distance is p = 1.5, 2.0, and 3.0; and the runtime limit is T = 0.1, 0.3, and 0.5 CPU seconds. For each test and algorithm, the number of solutions is 15, i.e., Nsol = 15, the number of independent runs is 55, and only the best 50 results are recorded to remove possible outliers; the stopping criteria are T = 0.1, 0.3, and 0.5 CPU seconds.
In all tables listed in S1 Appendix and the following two subsections, the notations F avg , F min , F max , and F std denote the average, minimal (the best), maximal (the worst) and standard deviation of the fitness values obtained from related algorithms. Additionally, the notations f avg , f min , f max and f std represent the number of F avg , F min , F max and F std that are the best among all algorithms under the same related conditions, e.g., p, T, and/or dataset.
To compare the efficiency of the update mechanism of the proposed iSSO, the average of the corresponding fitness calculation number (N avg ) and the number of best N avg represented by n avg are recorded. Note that for a fixed T, a higher N avg means that the related update mechanism is more efficient and increases the search performance for finding an optimal solution.
To properly evaluate the clustering method, the F measure value is provided and the number of best F measure [35,36] is represented by f mea . The F measure is one of the standard clustering validity measures based on the ideas of precision and recall from information retrieval [35,36]. Evidently, the bigger value of F measure is, the higher the quality of clustering is.
All experimental results are listed in S1 Appendix. S1 Appendix demonstrates that iSSO has achieved better solutions for each test problem with lower standard deviations and higher fitness computation numbers compared to the other methods.

General
Observations for f avg , f min , f max , f std , and n avg , and f mea All results in S1 Appendix are ranked and discussed in this subsection. Tables 2-5 summarize these ranking based on different T and p, T only, p only, and algorithm only, respectively. The letter next to the number denotes the related dataset, e.g., B2S denotes one best value in dataset B and two best value in dataset S.
From Table 2, iSSO has higher numbers in f avg , f min , f max , f std , n avg , and f mea than other methods for different setting of T and p. Hence, iSSO is more efficient, effective, and robust than other methods. Table 3 summarizes the values of f avg , f min , f max , f std , n avg , and f mea for T = 0.1, 0.3, and 0.5 separately. We can observe that the longer runtime is, the better the solution quality obtained from iSSO in Table 3. For example, f min is increased from 18 to 23 for T = 0.1 to T = 0.2. PSO is the second best in f mea for both T = 0.1 and 0.3; SSO is the second best in f min for T = 0.1 and 0.2, and in f mea for T = 0.3. Additionally, as seen from Table 3, iSSO tends to perform much better than other methods from time to time, e.g., there are six cases in which F min are better than that of iSSO for T = 0.1 but none in which F min is better than that of iSSO for T = 0.3. Table 4 sums up the values of f avg , f min , f max , f std , n avg , and f mea for p = 1.5, 2.0, and 2.5 separately. It is evident that iSSO is still the best method compared to the others in all aspects. According to published results, other methods work more effectively when p = 2.0 [7][8][9][10][11][12][13][14][15][16]. However, given the results, iSSO still retains its performance, regardless of the value of p. For example, f min is 21 for p = 1.5 and 22 for both p = 2.0 and 2.5. Interesting observations can still   be found, as observed in Table 3, where, in general, CGS yields better results for the S dataset, than other datasets. PSO and SSO follow on in performance with f mea in p = 2.0 and p = 2.5, respectively. Table 5 lists the overall values of f avg , f min , f max , f std , n avg , and f mea for CGS, iSSO, MLS, PSO, and SSO separately. In general, it seems that iSSO is only slightly more powerful within dataset A as f min = 4 and f mea = 5 for SSO and for within dataset S as f min = 2 for MLS, f mea = 2 for PSO, and f avg = 5 and f max = f std = 6 for CGS. This is similar to what is observed in Tables 2 and  3. However, the number of best values for iSSO in all statistical indexes are still more than 6.2 times better compared to those of other methods. For example, for f std , CGS produced nine best values (3 in B dataset and 6 in S dataset), whereas iSSO produced 64 best values. This trend is also found when iSSO is compared across all algorithms and thus demonstrates that iSSO outperforms the other algorithms in almost all aspects.

General
Observations for F avg , F min , F max , F std , N avg , and F measure In general, each result obtained using the proposed iSSO is better than those obtained using the other methods described S1 Appendix and Section 4.2. For an elaborate analysis, the top five values of F min for each dataset under all settings of T and p are summarized in Table 6 and discussed in this subsection.
In Table 6, the proposed iSSO has the largest number (33) of results among the top five values, and SSO, PSO, and CGS have four, two, and one results among the top five values of F min , respectively. Note that in most cases SSO yields better results than CGS and MLS, as seen in Table 6, but all of the values of f avg , f min , f max , f std are zero for SSO in Section 4.2. Table 4. The values of f avg , f min , f max , f std , n avg , and f mea for p = 1.5, 2.0, and 2.5. p Alg   Additionally, we can see that the top four F min values are all obtained from the proposed iSSO for all datasets, except iSSO only has the top two F min in both A and S datasets, of which SSO has the 3 rd and 4 th best F min , and PSO has the 5 th best F min . It seems that the algorithm with the best F min also has the best F avg , F max , F std , and N avg in all datasets. However, the algorithm with the best F min does not guarantee its F measure is also the best, this is applicable to A, B, C, G, I and W datasets.

Conclusions
In this work, a new soft computing method called the iSSO-KHM is proposed to solve the KHM clustering problem. The proposed iSSO-KHM adapted the fundamental concepts in both the traditional SSO and KHM by adding the novel one-variable difference update mechanism to update solutions and the survival-of-the-fittest policy to decide whether to accept the new update solutions.
The experimental results show the superiority of iSSO-KHM over the other three algorithms for almost all eight benchmark datasets. Hence, iSSO-KHM can achieve a trade-off between exploration and exploitation to generate a good approximation in a limited computation time systematically, efficiently, effectively, and robustly.
However, from the experiments in Section 4, the improved F min value does not mean that the F measure is also improved. Therefore, a potential area of exploration would be to include F measure in the fitness function to improve both values of F min and F measure . Another limitation of the proposed algorithm is that C g and C w in Eq 8 of the proposed update mechanism must be known in advance, this also brings up another practical problem that is to develop a parameter free idea in the proposed algorithm in the future.
As there are some recently proposed swarm-based clustering algorithms, it is necessary to have more comparisons about the proposed algorithm with other well-known swarm-based clustering algorithms in the future. In Section 4, "Experimental results", the choice of the parameter K is fixed to 3. The proposed approach will also compare with the other versions of KHM for different values of K (like the case of p and T parameters).