Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets.


Introduction
Data clustering is an important task in the field of unsupervised datasets. The clustering technique distributes the dataset into clusters of similar features [1]. To solve a clustering problem, the number of clusters that fits a dataset must be determined, and the objects for these clusters must be assigned appropriately. The number of clusters may or may not be known, thereby making it difficult to find the best solution to the clustering problem. As such, the clustering problem can be viewed as an optimization problem. This challenge has led to the proposal of many automatic clustering algorithms in previous literature; these algorithms estimate the appropriate number of clusters and appropriately partition a dataset into these clusters without the need to know the actual number of clusters [2][3][4][5][6][7][8]. Most of these algorithms rely exclusively on one internal evaluation function (validity index). The validity index has an objective function to evaluate the various characteristics of clusters, which illustrates the clustering quality and accuracy of the clustering solutions [9]. Nevertheless, the single evaluation function is often ineligible to determine the appropriate clusters for a dataset, thus giving an inferior solution [10]. Accordingly, the clustering problem is structured as a multi-objective optimization problem wherein different validity indices can be applied and evaluated simultaneously.
Several automatic multi-objective clustering algorithms are proposed in literature to solve the clustering problem. Evolution appeared in this area after Handl and Knowles [3] proposed an evolutionary approach called multi-objective clustering with automatic K determination (MOCK). For some of the automatic multi-objective clustering algorithms related to MOCK, can refer to [11][12][13]. A multi-objective clustering technique inspired by MOCK named VAMOSA, which is based on simulated annealing as the underlying optimization strategy and the point symmetry-based distance, was proposed by Saha and Bandyopadhyay [5].
How to deal with various shapes of datasets (hyper spheres, linear, spiral, convex, and nonconvex), overlapping datasets, datasets with a small or large number of clusters, and datasets that have objects with small or large dimensions without providing the proper clustering or knowing the cluster number is a challenge. Saha and Bandyopadhyay [8] developed two multi-objective clustering techniques (GenClustMOO and GenClustPESA2) by using a simulated annealingbased multi-objective optimization technique and the concept of multiple centers to each cluster that can deal with different types of cluster structures. GenClustMOO and GenClustPESA2 were compared with MOCK [3], VGAPS [4], K-means (KM) [14], and single-linkage clustering technique (SL) [15] using numerous artificial and real-life datasets of diverse complexities. However, these algorithms did not give the desired high accuracy in clustering datasets.
The current study proposes an automatic clustering algorithm, namely, hybrid multi-objective particle swarm optimization with simulated annealing (MOPSOSA), which deals with different sizes, shapes, and dimensions of datasets and an unknown number of clusters. The Numerical results of the proposed algorithm are shown to perform better than those of the GenClustMOO [8] and GenClustPESA2 [8] methods in terms of clustering accuracy (see the Results and Discussions Section). In order to deal with any dataset and qualification to determine appropriate clusters and obtain good solutions with high accuracy, combinatorial particle swarm optimization II [7]is developed to deal with three different cluster validity indices, simultaneously. The first cluster validity index is the Davies-Bouldin index (DB-index) [16], which is based on Euclidean distance; the second one is symmetry-based cluster validity index (Symindex) [4], which is based on point symmetry distance; and the last one is a connectivity-based cluster validity index (Conn-index) [17], which is based on short distance. If no change exists in a particle position or when it is moved to a bad position, then the MOPSOSA algorithm uses MOSA [18] to improve the searching particle. The MOPSOSA algorithm also utilizes KM method [14] to improve the selection of the initial particle position because of its significance in the overall performance of the search process. It creates a large number of Pareto optimal solutions through a trade-off between the three different validity indices. Therefore, the idea of sharing fitness [19] is incorporated in the proposed algorithm to maintain diversity in the repository that contains Pareto optimal solutions. Pareto optimal solutions are important for decision makers to choose from. Furthermore, to comply with the decision-maker requirements, the proposed algorithm utilizes a semi-supervised method [20] to provide a single best solution from the Pareto set. The performance of MOPSOSA is compared with the performances of three automatic multi-objective clustering techniques, namely, GenClustMOO [8], GenClustPESA2 [8], and MOCK [3], and with those of three single-objective clustering techniques, namely, VGAPS [4], KM [14], and SL [15], using 14 artificial and 5 real-life datasets.
The reminder of this paper is structured as follows; Section 2 describes the multi-objective clustering problem; Section 3 illustrates the proposed MOPSOSA algorithm in details; Section 4 presents the datasets used in the numerical experiments, the evaluation of clustering quality, and the setting of the parameters for the MOPSOSA algorithm; Section 5 includes discussion of the results; Finally, concluding remarks are given in Section 6.

Clustering Problem
The clustering problem is defined as follows: Consider the dataset P = {p 1 ,p 2 ,. . .,p n }, where p i = (p i1 ,p i2 ,. . .,p id ) is a feature vector of d-dimensions and also referred to as the object, p ij is the feature value of object i at dimension j, and n is the number of objects in P. The clustering of P is the partitioning of P into k clusters {C 1 ,C 2 ,. . .,C k } with the following properties: The clustering optimization problem with one objective function for the clustering problem can be formed as follows: min=max C2Y f ðCÞ such that Eqs (1) to (3) are satisfied, where f is the validity index function, Θ is the feasible solutions set that contains all possible clustering for the dataset P of n objects into k clusters, C = {C 1 ,C 2 ,. . .,C k } and k = 2,3,. . .,n-1.
The multi-objective clustering problem for S different validity indices is defined as follows: where F(C) is a vector of S validity indices. Note that there may be no solution that minimizes all the functions f i (C). Therefore, the aim is to identify the set of all non-dominant solutions.
Definition: Consider C and C Ã as two solutions in the feasible solutions set Θ, the solution C is said to be dominated by the solution The Pareto optimal set is a set that includes all non-dominated solutions in the feasible solutions set Θ.

The Proposed MOPSOSA Algorithm
Simulated annealing requires more calculation time than does particle swarm optimization [21]. The former requires low variations of temperature parameters to obtain a global solution [22]. Some of the particles may become stagnant and remain unchanged, especially when the objective functions of the best personal position and the best global position are similar [21]. As such, the particle cannot jump out, which in turn causes convergence toward the local solution and the loss of its capability to search for the optimal Pareto set. This phenomenon is a disadvantage in comparison with simulated annealing, which can jump away from a local solution. The proposed MOPSOSA algorithm, as previously mentioned, is a hybrid algorithm that merges the advantages of fast calculation and convergence in particle swarm optimization with the capability to evade local solutions in simulated annealing.
The clustering solution X i is described using label-based integer encoding [23]. Each particle position is a clustering solution. The particle position X t i and velocity V t i are presented as vectors with n components X t . . . ; V t in Þ at time t, i = 1,. . ., m, where n is the number of data objects, and m is the number of particles (swarm size). The position component X t ij 2 f1; . . . ; K t i g represents the cluster number of j th object in i th particle, and V t ij 2 f0; . . . ; K t i g represents the motion of j th object in i th particle, where K t i 2 fK min ; . . . ; K max g is the number of clusters related to particle i at time t (where K min and K max are the minimum and maximum number of clusters, respectively; the default value of K min is 2; and K max is ffiffiffi n p þ 1 unless it is manually specified) [24]. The best previous position of i th particle at iteration t is represented as XP t i ¼ ðXP t i1 ; XP t i2 ; . . . ; XP t in Þ. The leader position chosen from the repository of Pareto sets for i th particle at iteration t is represented by . . . ; GP t in Þ. The flowchart in Fig 1 illustrates the general process of the MOPSOSA algorithm. The process of the algorithm is described in the following 11 steps: Step 1: The algorithm parameters, such as swarm size m, number of iterations Iter, maximum and minimum numbers of clusters, velocity parameters, initial cooling temperature T 0 , and t = 0, are initialized.
Step 2: The initial particle position X t i using KM method [14], initial velocity V t i ¼ 0, and initial XP t i ¼ X t i , i = 1,. . .,m are generated. Step 4: The leader XG t i from the repository of Pareto sets nearest to current X t i is selected. The clusters in XP t i and XG t i are renumbered on the basis of their similarity to the clusters in X t i , i = 1,. . .,m.
Step 5: The new Vnew i and Xnew i , i = 1,. . .,m, are computed using XG t i , XP t i , X t i , and V t i .
Step 6: The validity of Xnew i , i = 1,. . .,m is checked, and the correction process is applied if it is not valid.
Step 8: A dominance check for Xnew i , i = 1,. . .,m is performed, that is, if Xnew i is non-dominated by X t i , then X tþ1 Step 9: The new XP tþ1 i , i = 1,. . .,m is identified.
Step 10: The Pareto set repository is updated.
Step 11: t = t + 1 is set; if t ! Iter, then the algorithm is stopped and the Pareto set repository contains the Pareto solutions; otherwise, go to step 4.
The following sections will elucidate the steps of the MOPSOSA algorithm.

Particles swarm initialization
Initial particles are generally considered one of the success factors in particle swarm optimization that affect the quality of the solution and the speed of convergence. Hence, the MOPSOSA algorithm employs KM method as a means to improve the generation of the initial swarm of particles. Fig 2 depicts a flowchart for the generation of m particles. Starting with i = 1 and W = min{K max −K min +1,m}, if W = m, then m particles will be generated by KM method with the number of clusters K i = K min +i−1, i = 1,. . .,m. If W = K max −K min +1, then the first W particles will be generated by KM with the number of clusters K i = K min +i−1, i = 1,. . .,W, and the other particle will be generated by KM with the number of clusters K i , i = W+1,. . .m selected randomly between K min and K max . For each particle, the initial velocities are selected to be zero V i = 0, i = 1,. . .,m, and the initial XP i is equal to the current position X i for all i = 1,. . .,m.

Objective functions
The proposed algorithm uses three types of cluster validity indices as objective functions to achieve optimization. These validity indices, DB-index, Sym-index, and Conn-index, apply three different distances, namely, Euclidean distance, point symmetric distance, and short distance, respectively. Each validity index indicates a different aspect of good solutions in clustering problems. These validity indices are described below. DB-index. This index was developed by Davies-Bouldin [16] which is a function of the ratio of the sum of within-cluster objects (intra-cluster distance) and between cluster separation (inter-cluster distance). The within i th cluster C i , S i,q is calculated using Eq (5). The distance between clusters C i and C j is denoted by d ij,t , which is computed using Eq (6).
where n i = |C i | is the number of objects in cluster C i , c i is the cluster center of cluster C i and is defined as c i ¼ 1 n i P p2C i p, and q and t are positive integer numbers. DB is defined as: . A small value of DB means a good clustering result.
Sym-index. The recently developed point symmetry distance d ps (p,c) is employed in this cluster validity index Sym, which measures the overall average symmetry in connection with the cluster centers [4]. It is defined as follows. Let p be a point, and the reflected symmetrical point of p with respect to a specific center c is 2c − p and is denoted by p Ã . Let knear unique nearest neighbors to p Ã be at the Euclidean distances of d i , i = 1,. . .,knear. The point symmetric distance is defined as: where d e (p,c) is the Euclidean distance between the point p and the center c and d sym (p,c) is a symmetric measure of p with respect to c, which is defined as P knear i¼1 d i =knear. In this study, knear = 2. The cluster validity function is defined as where j is the j th object of cluster i, and D k ¼ max k i;j¼1 kc i À c j k is the maximum Euclidean distance between the two centers among all cluster pairs. Eq (8) is used with some constraint to compute d Ã ps ðp i j ; c i Þ. The knear nearest neighbors of p Ã j and p i j should belong to the i th cluster, where p Ã j is the reflected point of the point p i j with respect to c i . A large value for Sym-index means that the actual number of clusters and proper partitioning are obtained.
Conn-index. The third cluster validity index used in this study is proposed by Saha and Bandyopadhyay [17], it depends on the notion of cluster connectedness. To compute Connindex, the the relative neighborhood graph [25] structuring for the dataset has to be conducted first. Subsequently, the short distance between two points x and y is denoted by d short (x,y) and is defined as follows: where npath is the number of all paths between x and y in the RNG structuring; ned i is the number of edges along i th path, i = 1,. . .,npath; ed i j is j th edge in i th path, j = 1,. . .,ned i and i = 1,. . ., npath; and wðed i j Þ is the edge weight of the edge ed i j . The edge weight wðed i j Þ is equal to the Euclidean distance between a and b, d e (a,b), where a and b are the end points of the edge ed i j . The cluster validity index Conn developed by Saha and Bandyopadhyay [17] is defined as follows: where m i is the medoid of the i th cluster that is equal to the point with the minimum average distance to all points in the i th cluster m i ¼ p i minindex , and The minimum value of Conn-index means the clusters interconnected internally and separately from each other. After the particles have been moved to a new position, the three objective functions are computed for each particle in the swarm. The objective functions for a particle position X are {DB(X),1/Sym(X),Conn(X)}. The three objectives are minimized simultaneously using MOP-SOSA algorithm.

XP updating
The previous best position of i th particle at iteration t is updated by non-dominant criteria. XP t i is compared with the new position X tþ1 i . Three cases of this comparison are considered.
i are non-dominated, then one of them will be chosen randomly as XP tþ1 i . This update occurs on each particle.

Repository updating
The repository is utilized as a guide by MOPSOSA algorithm for the swarm toward the Pareto front. The non-dominated particle positions are stored in the repository. To preserve the diversity of non-dominated solutions in the repository, sharing fitness [19] is a good method to control the acceptance of new entries into the repository when it is full. Fitness sharing was used by Lechuga and Rowe [26] in multi-objective particle swarm optimization. In each iteration, the new non-dominated solutions are added into the external repository and elimination of the dominated solutions. In case the non-dominated solutions are increased than the size of the repository, the fitness sharing is calculated for all non-dominated solutions. The solutions that have largest values of fitness sharing are selected to fill the repository.

Cluster re-numbering
The re-numbering process is designed to eliminate the redundant particles that represent the same solution. The proposed MOPSOSA algorithm employs the re-numbering procedure designed by Masoud et al. [7]. This procedure uses a similarity function to measure the degree of similarity between the clusters of two input solutions X t i and XP t i (or XG t i ). The two clusters that are most similar are matched. Any cluster in XP t i (or XG t i ) not matched to any cluster X t i will use the unused number in the clustering numbering. MOPSOSA algorithm uses the similarity function known as Jaccard coefficient [27], which is defined as follows: where C j is j th cluster in X t i , _ C k is k th cluster in XP t i , n 11 is the number of objects that exist in both C j and _ C k , n 10 is the number of objects that exist in C j but does not exist in _ C k , and n 01 is the number of objects that do not exist in C j but exist in _ C k .

Velocity computation
MOPSOSA algorithm employs the expressions and operators modified by Masoud et al. [7]. The new velocity for particle i at iteration t is calculated as follows: where W, R 1 , and R 2 are the vectors of n components with values 0 or 1 that are generated randomly with a probability of w, r 1 , and r 2 , respectively. The operations , È, and É are the multiplication, merging, and difference, respectively.
• Difference operator⊖ ⊖ The difference operation calculates the difference between X t i and XP t i (or XG t ). Let The multiplication operator is defined as follows: let A = (a 1 ,. . .,a n ) and B = (b 1 ,. . .,b n ) are two vectors of n components, then AB = (a 1 b 1 ,. . .,a n b n ).

Position computation
MOPSOSA algorithm employs the definition to generate new positions, as proposed by Masoud et al. [7]. The new position is generated from the velocity as follows: ( where r is an integer random number in ½1; K t i þ 1 and K t i þ 1 < K max . This property enables the particle to add new clusters. The previous operators and the differences in cluster number of X t i , XP t i , and XG t lead to the addition or removal of some of the clusters in the output of the new position X tþ1 i . Sometimes an empty cluster may exist, which leads to invalid particle position. Such an instance can be avoided by exposing the particle to reset the numbering clusters. The re--numbering process works by encoding the largest cluster number to the smallest unused one.

MOSA technique
MOSA method [18] is applied in the MOPSOSA algorithm at iteration t for particle i in case X t i dominates the new position Xnew i . Fig 3 presents the flowchart for the MOSA technique applied in MOPSOSA. The procedure for the MOSA technique is explained in eight steps below. 1.
Step 1: Let PSX and PSV be two empty sets, niter is a maximum number of iteration, and q = 0.

2.
Step 2: where the cooling temperature T t is updated in step 8 of MOPSOSA algorithm. Generate uniform random number u2(0,1), if u<EXP q , go to step 7. Otherwise, proceed to the next step.

3.
Step 3: Add Xnew i to PSX and Vnew i to PSV, then PSX and PSV are updated to include only non-dominant solutions.

4.
Step 4: If q!niter, then choose a solution randomly from PSX as the new particle position Xnew i and the corresponding velocity Vnew i from PSV, and proceed to step 7. Otherwise, q = q+1, and generate the new velocity Vnew i and position Xnew i from the old position X t i . 5.
Step 6: Perform a dominance check for Xnew i , if Xnew i is non-dominated by X t i , then proceed to step 7. Otherwise go to step 2.

Selection of the best solution
In general, a Pareto set containing several non-dominated solutions is provided on the final run of multi-objective problems [28]. Each non-dominated solution introduces a pattern of clustering for the given dataset. The semi-supervised method proposed by Saha and Bandyopadhyay [20] is utilized in the MOPSOSA algorithm to select the best solution from the Pareto optimal set. This semi-supervised approach can only be applied when the cluster labels of some points in the dataset are known. The misclassification value is computed by using the Minkowski score MS [29]. Let T be the actual solution and C be the selected solution; hence, MS is defined as follows: The low values of MS are "better" with the optimal value for MS set as 0.

Experimental Study
This section presents the datasets used for the experiment, the measurement of the accuracy solution, and parameters settings of the proposed algorithm.

Experimental datasets
The MOPSOSA algorithm is examined on 14 artificial and 5 real-life datasets (S1 File). Table 1 displays the types of datasets, the number of points (objects), the dimensions (features), and the number of clusters. Further details on these datasets are provided below.
• Artificial datasets   3. Newthyroid [32] dataset (Appendix Q in S1 File): This dataset incorporates 215 instances with five laboratory tests distributed over three clusters. These samples are labeled as "Thyroid gland data," which embody three categories (i.e., normal, hypo, and hyper).
4. LiverDisorder [32] dataset (Appendix R in S1 File): This dataset represents 345 instances with six laboratory tests distributed over two clusters. The task is to determine whether a person suffers from alcoholism. [32] dataset (Appendix S in S1 File): This dataset involves 214 samples with nine features distributed over six clusters. The field of criminological investigations has motivated the study on classifying the types of glass. At the scene of the crime, a glass left can provide evidence if it is correctly identified. In this dataset, the 10th feature (ID number) has been removed.

Evaluating the clustering quality
An external criterion of the clustering quality for evaluating the results is presented in this section. The F-measure [33] is selected to compute the final solution obtained from the MOP-SOSA, GenClustMOO, GenClustPESA2, MOCK, VGAPS, KM, and SL clustering algorithms.
Let T and C be the two clustering solutions, T ¼ fT 1 ; . . . ; T k T g be the truth solution, and C ¼ fC 1 ; . . . ; C k C g be the solution to be measured, where k T and K C are number of clusters for the solutions T and C respectively. The F-measure of classes T i and cluster C j are defined as follows: where P(T i ,C j ) = n ij /|C j | and R(T i ,C j ) = n ij /|T i | Meanwhile, the F-measure of solutions T and C are construed below: where n is the number of the dataset. Higher values of F(T,C) are better values, and the optimal value of F(T,C) is 1. Table 2 presents the parameter settings employed in the proposed MOPSOSA algorithm. The performance of this algorithm is compared with three multi-objective automatic and three single-objective clustering algorithms (i.e., GenClustMOO, GenClustPESA2, MOCK, VGAPS, KM, and SL). These algorithms and the proposed algorithm are executed on all the above mentioned datasets. Employing semi-supervised method [20], the GenClustMOO and GenClust-PESA2 algorithms select the best solutions from the final Pareto set. Additional details on the standard parameters employed in these algorithms can be acquired in Saha and Bandyopadhyay [8]. In the MOCK algorithm, GAP statistics [34] is used to select the best solution. The source code of the standard parameters used in MOCK is available in [3]. VGAPS, KM, and SL clustering algorithms provide a single solution. In VGAPS, population size is equal to 100, the number of generation is equivalent to 60, and mutation and crossover probabilities are computed adaptively. The total computations implemented in the proposed algorithm, GenClust-MOO, GenClustPESA2, MOCK, and VGAPS, as well as the number of iterations in KM and SL, are all equal. Each algorithm is implemented 30 times.

Results and Discussions
For each algorithm, the average value of F-measure is calculated for the final best solution to compare and exhibit the performance of the proposed algorithm with that of other algorithms. More information about the results of the cluster number and F-measure values of GenClust-MOO, GenClustPESA2, MOCK, VGAPS, KM, and SL on the specified datasets can be acquired from Saha and Bandyopadhyay [8]. Table 3 displays the best value of F-measure and the number of clusters for the datasets automatically obtained with MOPSOSA, GenClustMOO, Gen-ClustPESA2, MOCK, and VGAPS automatic clustering techniques. KM and SL are implemented with the actual number of clusters on all datasets.

Discussion of the artificial datasets results
1. Sph_5_2: Table 4 displays that the maximum F-measure value for this dataset was obtained with the MOPSOSA algorithm even though existence five overlapping spherical clusters. However, MOPSOSA, GenClustMOO, GenClustPESA2, and VGAPS established the actual number of clusters as illustrated in Table 3. Fig 5a shows the clustering of this dataset after the MOPSOSA algorithm was applied.
2. Sph_4_3: The actual number for this dataset was detected with the MOPSOSA, GenClust-MOO, GenClustPESA2, MOCK, and VGAPS clustering algorithms. All seven algorithms also achieved an F-measure value of 1, providing 100% accuracy for the clustering of this dataset (refer to Tables 3 and 4). Fig 5b exhibits the graph of clusters Sph_4_3 after the MOPSOSA algorithm was employed.
3. Sph_6_2: The F-measure value for this dataset was determined to be 1 for the seven algorithms (Table 4), signifying the accurate performance of all algorithms. Moreover, all algorithms attained the real number of clusters as demonstrated in Table 3. Fig 5c depicts the graph of the clusters for this dataset with the application of the MOPSOSA algorithm.  Table 3. F-measure value and the number of clusters for different datasets obtained by MOPSOSA compared with those acquired by GenClustMOO, GenClustPESA2, MOCK, and VGAPS algorithms. Automatic Clustering Algorithm 4. Sph_10_2: Table 3 reveals that only the MOPSOSA and GenClustMOO clustering algorithms achieved the desired number of clusters for this dataset. However, a maximum Fmeasure value was obtained with MOPSOSA (refer to Table 4) despite some overlap in these datasets. Fig 5d shows the graph for the clustering of Sph_10_2 with the post-application of the MOPSOSA algorithm.

Sph_9_2
: For this dataset, Table 3 shows that MOPSOSA, GenClustMOO, MOCK, and VGAPS, except GenClustPESA2, were identified to be highly efficient in detecting the actual number of clusters. Despite the existence overlaps in all clusters for this dataset, MOPSOSA obtained a maximum F-measure value, demonstrating the accuracy of its performance (refer to  Tables 3 and 4 show that the MOPSOSA, GenClustMOO, and GenClustPESA2 clustering algorithms obtained the real number of clusters for this dataset with the F-measure value as 1, signifying the high clustering accuracy of these algorithms in clustering these nonlinear and non-spherically dataset. Fig 5g reveals the graph of the two clusters in Pat2 with the application of the MOPSOSA algorithm. The best F-measure for each dataset is marked in bold. Each algorithm is implemented on 30 independent runs.   Tables 3 and 4). Fig 5h presents the clustering of this dataset into two correct clusters with the application of the MOPSOSA algorithm.
9. Sizes5: Table 4 reveals the maximum F-measure value obtained with the MOPSOSA algorithm for this dataset, which indicates that the proposed algorithm is qualified to clustering a dataset with different sizes of clusters. Regardless, Table 3 specifies that both MOPSOSA and GenClustMOO identified the actual number of clusters. Fig 5i shows the result of clustering on this dataset with the application of the MOPSOSA algorithm.
10. Spiral: Table 4 indicates that an F-measure value of 1 was acquired by MOPSOSA, Gen-ClustMOO, and GenClustPESA2 for this dataset, indicating 100% accurate clustering on the spiral shapes. MOPSOSA, GenClustMOO, and GenClustPESA2 clustering algorithms also determined the real number of clusters as shown in Table 3. 12. Square4: Table 3 exhibits that, for this dataset, MOPSOSA, GenClustMOO, GenClust-PESA2, and MOCK, except VGAPS, established the actual number of clusters, with the maximum F-measure value obtained via MOPSOSA (see Table 4). The proposed algorithm was capable to clustering this data with high accuracy even though there are four overlapping clusters. The graph for the clustering of this dataset using the MOPSOSA algorithm is depicted in Fig 5l. 13. Twenty: For this dataset, MOPSOSA, GenClustMOO, MOCK, and VGAPS determined the real number of clusters (see Table 3), except GenClustPESA2. However, MOPSOSA, Gen-ClustMOO, and MOCK obtained an F-measure value of 1, demonstrating an extremely high clustering accuracy even for several clusters (refer to Table 4). The clusters for this dataset after the application of MOPSOSA algorithm is graphically shown in Fig 5m. 14. Fourty: Table 3 reveals that for this dataset, only three automatic clustering algorithms (MOPSOSA, GenClustMOO, and MOCK) identified the desired cluster number. All these algorithms also obtained the F-measure value of 1, demonstrating an exceedingly high clustering accuracy despite the large number of clusters (refer to Table 4). Fig 5n depicts the graph for clustering this dataset after the application of the MOPSOSA algorithm.
Discussion of the real-life datasets results 1. Iris: Table 4 shows that for this dataset, the maximum F-measure value was obtained with the proposed algorithm MOPSOSA. However, with the exception of MOCK, all four automatic clustering algorithms (MOPSOSA, GenClustMOO, GenClustPESA2, and VGAPS) resolved the proper number of clusters, as evidenced in Table 3.
2. Cancer: The maximum F-measure value for this dataset was obtained with the proposed MOPSOSA algorithm (see Table 4). Nevertheless, all five automatic clustering algorithms (MOPSOSA, GenClustMOO, GenClustPESA2, MOCK, and VGAPS) identified the correct number of clusters, as illustrated in Table 3. 3. Newthyroid: Table 4 reveals that the maximum F-measure value for this dataset was attained with the MOPSOSA algorithm. However, Table 3 specifies that only two automatic clustering algorithms (MOPSOSA and GenClustMOO) determined the actual number of clusters.
4. Liver Disorder: For this dataset, MOPSOSA, GenClustMOO, MOCK, and VGAPS, except GenClustPESA2, identified the actual number of clusters (refer to Table 3). Meanwhile, the maximum F-measure was achieved with the proposed algorithm MOPSOSA (refer to Table 4). Table 4 demonstrates that the maximum F-measure value for this dataset was obtained with the MOPSOSA algorithm. Only MOPSOSA and GenClustMOO automatic clustering algorithms were determined to be capable of achieving the desired number of clusters (see Table 3).

Summary of results
The above results signify that the proposed MOPSOSA algorithm achieves accurate results in all datasets. Moreover, the proposed algorithm can automatically establish the correct cluster numbers for all datasets used in the experiment. The algorithm is also proven capable of dealing with various shapes of datasets (hyper spheres, linear, and spiral), overlapping datasets, datasets that have well-separated clusters with any convex and non-convex shapes, and datasets that contain several clusters. With most datasets having dimensions from 2 to 9, objects from 150 to 1000, and number of clusters from 2 to 40, the MOPSOSA algorithm displays superiority over the three multi-objective automatic and three single-objective clustering algorithms.
The results also show that the GenClustMOO algorithm can automatically identify the actual cluster numbers, but with a lower quality of clustering accuracy than the proposed algorithm. In general, MOCK can detect the number of clusters for hyper spheres and linear, but it is unsuccessful for non-convex well-separated and overlapping clusters. The results also prove that the VGAPS algorithm is not suitable for non-convex well-separated clusters or for datasets with numerous clusters. The main factors that led to the accuracy of the proposed algorithm in solving the clustering problem are attributed to the power and speed of the search characterized by the particle swarm, with the guarantee of not becoming stagnant into local solutions via the MOSA algorithm. The development of particle swarm to address more than one validity index can cluster any dataset. The generation of the initial swarm of particles can be improved with KM method. Meanwhile, the repository for preserving the diversity of clustering solutions can be updated by adopting the sharing fitness, and the redundant particles can be eliminated with the renumbering process.

Conclusion
This research proposed a new automatic multi-objective clustering algorithm MOPSOSA based on a hybrid multi-objective particle swarm algorithm and multi-objective simulated annealing. A multi-objective particle swarm optimization was also developed from a combinatorial particle swarm optimization. The proposed algorithm was proven capable of automatically clustering the dataset into the appropriate number of clusters. With the simultaneous optimization of three objective functions, the Pareto optimal set was obtained from the proposed algorithm. The first objective function considered the compactness of the clustering based on Euclidean distance. The second function regarded the total symmetry of the clusters, and the third considered the connectedness of the clusters. The proposed algorithm was performed on 19 real-life and artificial datasets, and its performance was compared with that of three multi-objective automatic and three single-objective clustering techniques. MOPSOSA obtained better accuracy in its results compared to that of other algorithms. The results also demonstrated that the proposed algorithm can be used for datasets of various shapes and for overlapping and non-convex datasets.