Improved multi-objective clustering algorithm using particle swarm optimization

Multi-objective clustering has received widespread attention recently, as it can obtain more accurate and reasonable solution. In this paper, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO) is proposed. Firstly, a novel particle representation for clustering problem is designed to help PSO search clustering solutions in continuous space. Secondly, the distribution of Pareto set is analyzed. The analysis results are applied to the leader selection strategy, and make algorithm avoid trapping in local optimum. Moreover, a clustering solution-improved method is proposed, which can increase the efficiency in searching clustering solution greatly. In the experiments, 28 datasets are used and nine state-of-the-art clustering algorithms are compared, the proposed method is superior to other approaches in the evaluation index ARI.


Introduction
Knowledge discovery is a process of analyzing data from different prospective and summarizing it into useful information [1]. These methods include a number of technical approaches, such as classification, data summarization, regression and clustering [2]. Technically, clustering is a process of grouping the data instances based on one or more features of the data [3]. As a popular data analysis technique, clustering has attracted particular attention from researchers in recent years because of the need to analyze and understand information hidden in the datasets coming from different sources. The applications of cluster analysis have been used in a wide range of different areas such as face recognition [4], spatial database analysis [5] and traffic incidents [6].
The most popular algorithms among various clustering techniques are K-means [7] and hierarchical clustering [8,9]. For K-mean algorithm, it creates the partition of data set into k number of clusters where k is predefined [10]. Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. In the clustering process, those methods focus on obtaining groups by optimizing single objective. However, it is necessary to optimize several objective functions simultaneously in some real-world problems [11].Furthermore, those classical algorithms suffer from disadvantages of initial centroid selection, local optima, low convergence rate [1].
To overcome the drawbacks of these algorithms, PSO-based clustering methods have been studied by the academic community. Particle Swarm Optimization (PSO) is inspired by the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 social behavior of animals like fish schooling and bird flocking. It has become an efficient method for searching approximate optimal solution due to its simplicity, few parameter configuration and global exploration ability on some complex problems [12]. In addition, PSO is well-suited for multi-objective optimization as their use of a population enables the whole Pareto set to be approximated in a single algorithm run [3].
In the early years, researchers use PSO to perform single-objective optimization on clustering problems. An implementation of PSO for data clustering was introduced in [13]. In the algorithm, standard gbest PSO is used to find the centroids of a specified number of clusters. Shen et al. presented a mountain clustering based on improved particle swarm optimization (MCBIPSO) algorithm [14]. The improved PSO algorithm is used to find all peaks of the mountain function, and the calculation is easier and more efficient in deciding the clustering centers of data samples. Single objective PSO-based methods have used in the document clustering [15], image segmentation [16] and spatial datasets [17].
With the study of the multi-objective PSO algorithms, some of them are introduced to solve the clustering problems. Yang et al. proposed a hybrid clustering algorithm based on PSO and K-harmonic means (PSOKHM) in [18]. It shows that the PSOKHM algorithm increases the convergence speed of PSO, and is also capable of escaping from local optima. Abubaker et al. proposed a new automatic clustering algorithm based on multi-objective PSO and Simulated Annealing (MOPSOSA) in [19]. This method simultaneously optimizes three different objective functions, which are used as cluster validity indexes for finding the proper number of clusters. Armano G et al. combined the combinatorial form of PSO and locusbased adjacency genetic scheme in [20], the algorithm is robust and outperforms in the comprehensive experimental study.
However, previous studies have three limitations. Firstly, clustering problem is a combinatorial problem and the curse of dimensionality may be encountered in large datasets, this will lead to a dramatic deterioration in algorithm performance. Secondly, the distribution of Pareto set in clustering problem has unique feature, which makes the selection of global leader harder. Thirdly, particle representation in previous studies is in discrete space, but the PSO is more suitable for optimization in continuous space [21].
The purpose of this paper is to address the limitations of previous work. Towards this goal, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO) is proposed, which finds well-separated and compact clusters without the predefined number of clusters. Threefold efforts have been made in this paper. Firstly, it attempts to improve the particles during the iterations, which can get better solution with less iteration and alleviate the curse of dimensionality. Secondly, it analyzes the feature of clustering solutions distribution, which helps enhance the performance in leader selection and decision maker. In addition to that, we design a novel particle representation that helps PSO search the clustering solutions in continuous space. It is discovered that IMCPSO is superior to other approaches in terms of accuracy.
The rest of this paper is organized as follows. Section 2 briefly describes the related work about clustering algorithms and multi-objective PSO. Section 3 presents the IMCPSO approach in detail. A comprehensive set of experimental results are provided in Section 4. Finally, Section 5 reports the conclusions of obtained results and suggests some directions of future work. points can move from one cluster to another, and the number of clusters k is usually required in advance, e.g. K-means [7]. Hierarchical clustering consists of a sequence of partitions in a hierarchical structure, and takes the form of either agglomerative or divisive method, e.g. average-link hierarchical clustering [8] and Ward method [24]. Density-based methods group dataset into clusters based on density conditions. Clusters of dense regions are separated by regions of low density, such as DBSCAN [25,26], Mean shift [27].
In addition, graph-theoretical technique is used in clustering, e.g. spectral clustering [28] and BIRCH [29]. Since then, AP clustering [30] is proposed and uses the message between data points to determine the center of cluster.

Multi-objective PSO
In PSO framework, each particle is represented as a potential solution, and the particle achieves global optimization by moving its position in D-dimension search space. The velocity v i and the position x i are updated as follows: where . .,v iD ) is the velocity of particle i, t represents the generation number, c 1 and c 2 are local and global learning factors respectively, r 1 ,r 2 are random numbers between [0, 1], pbest i stands as the previous best position for the particle i, gbest is the global best position found so far in the entire swarm. The particle position is updeted by Eq (2). Added with a inertia weight factor w, the algorithm can have a better control over the search scope, and can achieve better results in the application of certain issues [31], and the particle velocity is adjusted to The first multi-objective particle swarm optimization (MOPSO) is reported by Coello [32]. The MOPSO differs from PSO in that it contains the processes of construction and maintenance of the external archive using the concept of Pareto optimality. Moreover, the mechanism in terms of the selection on gbest and pbest is quiet different. Therefore, recent multi-objective PSO works use the concept of Pareto optimality to select non-dominated particles as leaders, in order to make solutions converge to the true Pareto set [33,34].

Proposed method
In this section, we describe the IMCPSO method in detail. As already pointed out, it is based on the multi-objective PSO algorithm. IMCPSO consists of three main parts: objective functions, optimization and decision-making. Firstly, two conflicting objective functions are defined with the aim of obtaining compact and well-separated clusters. Secondly, we illustrate the process of optimization from three parts: particle representation, selection of leader and solution-improved process, because the optimization is the key work for searching good partitions. Finally, decision maker is used to select the most suitable solution in Pareto set.

Objective functions
The clustering problem is defined as follows. Consider the dataset P = {p 1 , p 2 , . . ., p n }, where p i = (p i1 , p i2 ,. . ., p id ) is a feature vector of d-dimensions and also referred to as the object, p ij is the feature value of object i at dimension j, and n is the number of objects in P. The clustering of P is the partitioning of P into k clusters {C 1 , C 2 ,. . .,C k } with the following properties: (1) S k i¼1 C i ¼ P, (2)C i \ C j 6 ¼ ;, such that i,j 2 {1,2,Á Á Á,k} and i 6 ¼ j. Given a clustering solution of the data, numerous measures for estimating its quality exists and the goal is to get most compact and well-separated clusters. Towards this goal, we introduce two objective functions-overall deviation and mean space between clusters-to evaluate the compactness and separability of clusters respectively.
(a) overall deviation The overall deviation of a clustering solution reflects the overall intra-cluster size of the data. The formula is as follows.
DevðCÞ ¼ P where C is the set of all clusters, i is the element of data, μ k is the centroid of cluster C k , δ(.,.) is the distance function (e.g. Euclidean distance). As an objective, overall deviation should be minimized.

(b) mean distance between clusters
This objective function reflects the difference between clusters. It is calculated by the minimum distance of the cluster's neighbors. In clustering analysis, neighbor is a local concept that reflects the relation of two data points. In this paper, we use the Gabriel graph to obtain the adjacency relation of all data points [35]. Gabriel graph is a subgraph of the Delaunay triangulation, which connects two data points v i and v j for which there is no other point v k inside the open ball with diameter [v i v j ]. The advantage of the Gabriel graph is that it can get the all-connected graphs with appropriate distance [36]. After getting the Gabriel graph of all points, the objective function is calculated as follows.
where N i is the neighbors set of data i in Gabriel graph. As an objective function, Mdc should be maximized. In order to minimized the objective as similar as Dev, this objective value could be negated (-Mdc).

Particle representation
Generally speaking, the particle representation is important because each particle is represented as a potential solution in PSO framework. Therefore, we design a continuous particle representation for clustering problem. In this scheme, shown in Fig 1, each particle is represented by a vector consisting of N+1 elements, where N is the number of data points. The first element in particle vector is k value and reflects the clusters number of the particle. The k value lies in the interval (1,k max ], where k max is the maximum number of clusters. The other N elements in particle vector are the position values of N data points. Each of these elements takes a value in the internal (0,1]. An example of particle representation is shown in Fig 1 (B). When a particle is converted into a clustering solution, all the position value of N data points is multiplied by the k value, then round up to obtain a clustering solution. A process of conversion is shown in Fig 1 (C).
The advantages of the proposed particle representation are manifold. On one hand, it can take full advantage of PSO searching in continuous space, since the PSO technique is suitable for finding solutions in continuous non-linear optimization problems [37]. On the other, the partitioning of different number of clusters can be represented, because the k value maintains flexibility to change. In addition, it tends to preserve the similar parts and mutate the different parts of pbest and gbest when the particle velocity is updated. This feature could help particles find the global optimal solution with inheriting the advantages of the other solutions. An example of iteration in IMCPSO is shown in Fig 2.

Leader selection strategy
The analogy of PSO with evolutionary algorithms makes evident the notion that using a Pareto optimality scheme could be the straightforward way to extend the approach to handle multiobjective optimization problems [38]. In general, a multi-objective problem of the type: subject to where x = [x 1 ,x 2 ,. . .,x D ] T is the vector of decision variables, f i : R n À !R; i ¼ 1; . . . ; k are the objective functions and g i ; h j : R D À !R; i ¼ 1; . . . ; m; j ¼ 1; . . . ; p are the constraint functions of the problem. Given two vectors x; y 2 R k , we say that x dominates y if x i y i for i = 1,. . .,k, otherwise x is nondominated with y. We say that a vector of decision variables is Pareto-optimal if it is nondominated with respect to F (F is the feasible region), and the Pareto Optimal Set (or Pareto set) P Ã ¼ fx 2 F jx is Pareto À optimalg [39]. However, the PSO algorithm with Pareto optimality scheme will produce not only one but a set of nondominated leaders and the selection of an "appropriate" leader becomes difficult, because these nondominated solutions are equally good mathematically. The choosing of pbest is relatively simple, which adopts non-dominated particle between current position and previous best position as pbest. If they do not dominate each other, then select them randomly [40].
The selection of gbest is relatively complicated, as a global leader, the location directly determines the search direction of particles. The main selection of global leader can be categorized as follows: (a) stochastic selection in Pareto set; (b) selection based on crowding density [41]. The former method tends to raise a greater selection probability in regions where particles are concentrated, which is not conducive to the distribution among Pareto set, and drops the diversity of the population. The latter method also tends to lead the particles falls into the local optimum, and the reason will be analyzed next.
Analysis of the Pareto set in clustering problem. In the clustering problem, the number of clusters k plays an important role in getting good clustering solution. The two objective functions, Dev and -Mdc, correspond to large k values (more compact) and smaller k values (well separated) respectively. As a result, different k values have relative clear distinction in the Pareto set. The cluster assigned of data points can only be taken in a limited category since the clustering problem is a combinatorial problem. When a particular k Ã produces a good clustering solution, it will dominate the other solution with the same k Ã , so the Pareto set lies sparse relatively around the k Ã . Fig 3 illustrates Pareto set in a clustering problem, the distributions of the Pareto set would be scattered in the clustering problem (ns 1 and ns 1 in Fig 3). For this reason, the selection based on crowding density is likely to choose a non-dominated solution like ns 2 , but no other non-dominated solution is near it, which makes the optimization algorithm falls into local optimum. Furthermore, when the partitions in a better k, the clusters are well-separated The v(t+1) will make x(t) move to the same position in the two dimenstion, so that the data 2 and 4 have a high probability of being assigned to the same cluster in the newly generated particle. (k = 2,3 in Fig 3) and the corresponding Pareto set is also very sparse. On the contrary, the Pareto set will be dense (k = 4,5 in Fig 3) when the k is not well-separated. Leader selection strategy of whole process. In order to avoid falling into local optimum, we propose a "leader selection strategy of whole process". The "process particles" we called are one set which contains the positions of the leftmost, middle and rightmost solutions of the Pareto set, as shown in Fig 4. Obviously, the positions of the leftmost and rightmost particles are easily determined. Before the middle particle is determined, all the Pareto solutions are normalized using the following formula.
where x i,j is the value of the ith particle on the jth objective function, max(x j ) and min(x j ) are the maximum and minimum of Pareto set on the jth objective function, respectively. x 0 i;j 2 ½0; 1 is the objective function value after normalization. The magnitude of each objective function becomes uniform and the fitness of each Pareto solution can be obtained by summing each objective function, as shown in the following formula, where m is the number of objective functions.
Thus, the selection of middle particle is the Pareto solution with the minimum fit i . Obviously, the middle particle is a good trade-off solution of all objective function. After the process particles are determined, the global leader is one of them, where selection probability of middle Improved multi-objective clustering algorithm using particle swarm optimization particle should be higher (0.7, empirically), and the leftmost and rightmost particles are assigned a lower probability (both 0.15, empirically). The advantages of leader selection strategy of whole process are: (a) it can lead particles to traverse all possible Pareto sets, and is not easy to fall into the local optimum; (b) no matter clusters number is large (k>10) or small (k<3), it can be well adapted.

Clustering solution-improved method
As a search-based method, PSO methods are easy to suffer from the curse of dimensionality with the increase of clustering dataset. As a result, the PSO technology is only to get a rough clustering distribution in the search process, an example is shown in Fig 5 (A).The main reason is that the PSO does not use any prior knowledge in the search process. To address this issue, we hope to find some properties of clustering to improve the solutions. In clustering problem, it is obvious that the probability of assigning to same group is higher where the data points are more similar. This implies that we can use this feature to improve the rough solution obtained by PSO. Fig 5 illustrate the solution-improved method, the steps are as follows.
Step 1 Each partition has a topological center, and the centers can be found by Gabriel graph. First, calculate the C value of each partition in the clustering solution. C value is the number of data point's neighbors that are of the same cluster in Gabriel graph. Then, the data with the largest C value is the topology center. If there are more than one topology center, take it randomly.
Step 2 For each topology center, agglomerative clustering is used. It starts from k clusters, each containing one topology center and the other data points need to be reallocated. A series Improved multi-objective clustering algorithm using particle swarm optimization PLOS ONE | https://doi.org/10.1371/journal.pone.0188815 December 5, 2017 of nested merging is performed until all the data points are grouped into the k clusters. The algorithm processes the Gabriel graph between the N data points, and agglomerates according to their similarity or distance. Agglomerative clustering is based on a local connectivity criterion. The run time is O(N 2 ).
Step 3 The optimized clusters are converted into new particle vector. The k is placed in the first element of vector, and then the grouping number corresponding to each data points are divided by k and place them to other N elements. The converted particle vector is similar to gbest and pbest in Fig 2. The data points are rearranged to consistent with the characteristics of clustering by this technology, while the previous rough distribution is kept as much as possible. In this way, a better clustering solution can be obtained with acceptable computational cost even if the curse of dimensionality exists. It enhance the convergence and quality of solution greatly. An example is shown in Fig 6. In the actual search process, the rough partitioning of the clustering problem is obtained by PSO firstly. Then, the solution-improved method is used to get exact partitioning based on the rough solution. It must be emphasized that the improved solutions will help PSO to get better partitioning in the updating of particles. A real search process is shown in the Fig 7. Decision making Similar to the leader particle selection, decision making is also an important issue when faced with many Pareto optimal solutions. In general, a decision maker is required to make a tradeoff while deciding which solution to choose in presence of a large finite number of Pareto set. Here, we use the filter proposed in [42], which discards the solutions that are in the boundaries of the Pareto set. Based on that, decision maker finds the suitable solution based on crowding density in the remaining solution.
The use of the filter is controlled by two angles α 1 , α 2 , each controlling a ray passing through the origin (-Mdc is negative and a small value is used as the coordinate of the origin). Each region is described by one angle α i in regard to the corresponding objective axis and the Pareto set belonging to the region is discarded. The angle α i is updated in every iteration, and a quality evaluation function is used (please refer to [42] for details).The method can filter the Pareto set that barely take into account more than a single objective.
However, it is hard to make decision in the remaining Pareto set because they all have well trade-off between the two objective functions. According to the analysis of section 3.3.1, sparser Pareto solution has better k which makes the clusters well-separated. Therefore, we use the sparse coefficient to choose the sparsest particle among the remaining Pareto set. The sparse coefficient is defined as follows after all the Pareto solutions are normalized.
where sc i is the sparse coefficient of ith Pareto set, d l,i and d r,i are the Euclidean distance to nearest left Pareto solution and nearest right Pareto set, respectively. The Pareto solution with the largest sparse coefficient is taken as the decision solution, as shown in Fig 8. In summary, the pseudo-code of IMCPSO presented in this paper is as follows. Initialize swarm randomly // using particle represent Improve every particle in swarm //using clustering solution-improved method for each particle in swarm do Evaluate objective functions for particle end for

Experiments and discussion
This section analyzes the performance of the IMCPSO. Firstly, 14 real-world datasets and 14 artificial datasets are analyzed. Secondly, the parameters in IMCPSO are configured and the results are calculated in each dataset. Finally, IMCPSO is compared with eight state-of-the-art single-objective algorithms (four algorithms need clusters number k and others do not), and one state-of-the-art multi-objective clustering algorithm. The above experiments are run on the MATLAB R2014a, windows 7(x64), Core i5 (2.4G Hz), 4 GB RAM.

Datasets and comparison metrics
There are 28 datasets used in the experiments, where 14 datasets are artificial datasets and the other 14 datasets are real-world datasets, which collected from the UCI Machine Learning Repository [43] and KEEL [44]. The summary of datasets is shown in Table 1 and Fig 9 shows the shape of all 2-dimension datasets. The compressed file package of all datasets used in this paper is S1 File. The accuracy of each solution was quantified using the Adjusted Rand index (ARI) [45] since the standard labels exist in each dataset. ARI measures the similarity between the obtained clusters and the true ones and lies in the interval [-1,1]. This index has high value when the obtained clusters are more similar to the true ones.

Configuration
The parameters of the proposed algorithm are similar to other multi-objective PSO, including inertia weight w, learning parameters c 1 and c 2 , maximum iteration number iter max , number of particle N p and maximum number of clusters k max . We carried out 20 independent experiments with the different setting in five datasets and got the best configuration. The parameters are set as follows: w = 0.85, c 1 = c 2 = 0.7, iter max = 500, N p = 20 and k max = 20.

Experimental comparisons
The experimental comparison is divided into three parts. First, we compare it with the singleobjective clustering algorithms that can determine k automatically. Secondly, we compare it Store non-dominated particles in NonDom for iter in 1 to iter max do Select gbest in NonDom //using leader selection strategy Update particle, pbest for each paricle in swarm do Improve particle solution //using clustering solution-improved method Evaluate objective functions for particle end for Store non-dominated particles in NonDom end for Select decision solution in NonDom //using decision-making method with the single-objective clustering algorithms that needs k in advance, the correct k is provided for these algorithms to ensure the best performance. Finally, IMCPSO compares with the state-of-the-art multi-objective clustering algorithm.
Comparison with automatic k-determination clustering methods. There are mainly four clustering algorithms with automatic k-determination: AP clustering, Mean Shift, Ward and DBSCAN. After multiple runs on datasets, the outputs of above algorithms are compared with IMCPSO as shown in the Table 2, and the best result among five methods is shown in bold.
Comparison with k-needed clustering methods. The performance of IMCPSO has been compared with those K-means, spectral clustering, average-link and Birch, as those algorithms have in common is that require the clusters number k in advance. The correct k is fed into these algorithms to ensure the best performance while IMCPSO does not need k. The results of comparison are shown in Table 3, and the best result among five methods is shown in bold.
Comparison with the multi-objective clustering method. The state-of-the-art multiobjective clustering algorithm is the MCPSO proposed in [20]. The ARI index of same datasets is compared since the author does not provide the source code of MCPSO, and results are Improved multi-objective clustering algorithm using particle swarm optimization shown in Table 4, the best result among two methods is shown in bold. In particular, the maximum number of iterations iter max is 1000 in MCPSO, while iter max is only 500 in IMCPSO.

Results analysis
In the shape clustering datasets, it can be seen that baseline clustering methods have better performance than IMCPSO on only 2 datasets (Jain and D31) in terms of ARI index, because the IMCPSO shows a good performance dealing with arbitrary clusters. It is worth noted that clusters number of D31 is 31, but the k max in configuration of IMCPSO is 20, this is the reason why the IMCPSO shows bad performance on this dataset. The high dimension datasets are spherical Gaussian distribution, which can be assumed as the most suitable for the AP clustering, Ward, Birch and K-means. In particular, K-means and Birch with correct k shows high performance on those datasets. Nevertheless, the performance of IMCPSO is still very well and overall ARI index is above 0.9. In addition, the IMCPSO has best performance on 2 datasets (Dim032 and Dim256).
In the real-world datasets, the performance of k-needed clustering methods are better than the automatic k-determination methods since the correct k is given. Nevertheless, the performance of IMCPSO is better than the k-needed clustering algorithms in 7 datasets (Ecoli, Glass, Haberman, Ionosphere, Iris, Segment, Vehicle). Moreover, the performance of IMCPSO is better than the state-of-the-art multi-objective method (MCPSO) in 10 datasets of total 14 datasets with less iteration number.
To examine the performance in terms of the statistical differences between the proposed algorithm and other contrastive algorithms, we performed the Wilcoxon Signed-Rank test [46] on the mean ARI results of each datasets type. The Wilcoxon signed-rank test is a nonparametric statistical hypothesis test and it is used to determine differences between groups of paired data. Table 5 is listed with the results of signed-ranks test between IMCPSO and contrastive algorithms, N is the number of data and the statistically significant results are highlighted in bold text. For shape datasets, the differences between IMCPSO and other 7 methods are significant with P<0.05(AP clustering, Mean Shift, Ward, DBSCAN, K-means, Average-link and Birch). For high dimension datasets, there are 4 methods have significant differences between IMCPSO and themselves (Mean Shift, DBSCAN, Average-link and Spectral clustering). There are 4 contrastive methods (AP clustering, Mean Shift, DBSCAN and Average-link) have significant differences to IMCPSO in the real-world datasets. It is worth noted that no significant difference between IMCPSO and MCPSO, but the p value is very small (0.078) in the shape datasets.
In summary, the performance of IMCPSO is superior to the other clustering methods in the real-world and shape datasets. Especially, it has significant differences to most contrastive algorithms in the shape datasets. In the high dimension datasets, the differences between IMCPSO and other best algorithms are small, while the IMCPSO has less iteration and does not need k.

Conclusion
In this study, a multi-objective clustering method (i.e., IMCPSO) is proposed. Particle representation, leader selection and decision maker are taken into consideration, and clustering is made automatically. The IMCPSO extends the multi-objective PSO to clustering problem, Improved multi-objective clustering algorithm using particle swarm optimization which greatly increases the efficiency in searching clustering solution and helps the algorithm to avoid trapping in local optimum. The performance of IMCPSO has been studied in comparison with eight single-objective clustering methods and a state-of-the-art multi-objective clustering algorithm. The results, obtained over 28 benchmark datasets, show that the IMCPSO is able to outperform the other algorithms in term of accuracy over the majority of datasets. Moreover, IMCPSO does not need to provide the number of clusters in advance. In our future research, more objective functions will be investigated, and we plan to enhance the performance of IMCPSO to solve the real-world clustering problem. We have already begun with it and found it is efficient in spatial clustering problems.
Supporting information S1 File. The compressed file package of all datasets used in this paper. (ZIP)