An improved adaptive memetic differential evolution optimization algorithms for data clustering problems

The performance of data clustering algorithms is mainly dependent on their ability to balance between the exploration and exploitation of the search process. Although some data clustering algorithms have achieved reasonable quality solutions for some datasets, their performance across real-life datasets could be improved. This paper proposes an adaptive memetic differential evolution optimisation algorithm (AMADE) for addressing data clustering problems. The memetic algorithm (MA) employs an adaptive differential evolution (DE) mutation strategy, which can offer superior mutation performance across many combinatorial and continuous problem domains. By hybridising an adaptive DE mutation operator with the MA, we propose that it can lead to faster convergence and better balance the exploration and exploitation of the search. We would also expect that the performance of AMADE to be better than MA and DE if executed separately. Our experimental results, based on several real-life benchmark datasets, shows that AMADE outperformed other compared clustering algorithms when compared using statistical analysis. We conclude that the hybridisation of MA and the adaptive DE is a suitable approach for addressing data clustering problems and can improve the balance between global exploration and local exploitation of the optimisation algorithm.


Introduction
Data clustering is widely used in different applications to understand the structure of the data, to focus on a specific set of clusters for further analysis, and to detect the characteristics of each cluster. Data clustering has been developed and used as an essential tool for different disciplines, in areas such as Information Retrieval [1], the Internet of Things [2], Business [3], Medicine [4], and Image segmentation [5]. PLOS  In recent times, clustering methods have been extensively studied [6][7][8]. The clustering methods can be classified based on the fitness function. The popular clustering methods are classified as partitioning clustering methods. The partitioning clustering methods attempt to divide the dataset into a set of disjoint clusters and try to optimise specific criterion function, which may emphasise the local structure of the data. The most popular partitioning clustering algorithms are k-means, k-medoids, expectation maximisation, clustering large applications and clustering large application based on randomised search [9].
The K-means algorithm is one of the popular for centre-based clustering [9], which is recognised as being simple and efficient. However, K-means can detect only well separated, compact or spherical clusters [10]. It is sensitive to noise due to the use of squared Euclidean distance, where any data object in the cluster can significantly influence the centre of clusters. The performance of K-means is highly sensitive to the selection of initial centres [11]. Improper initialisation may lead to empty clusters, weak convergence or a high possibility of getting trapped in a local optima [9]. Some researchers overcome these drawbacks by using meta-heuristics, such as Genetic algorithms [12], Particle Swarm Optimization [13], Ant Colony Optimization [14], Black Hole Algorithm [15], Gravitational Search Algorithm [16] and Krill Herd algorithm [17].
In the clustering problems, the balance between exploration and exploitation can affect the ability of the clustering algorithm to find good clusters among the datasets being used [18]. Some of the earlier proposed clustering algorithms, based on meta-heuristics, managed to find good clustering solutions for specific datasets. However, across all datasets, it was unable to find good results, or the results were not robust [7]. This might be due to the imbalance between exploration and exploitation of the meta-heuristic algorithm, which may lead to premature convergence or stagnation [19]. Some researchers have proposed a hybrid approach of a global search with a local search in order to achieve a better balance. The global search handles exploration, while exploitation is handled by the local search [20][21][22][23]. Memetic Algorithms (MAs) are one type of hybrid evolutionary algorithms that offer an efficient optimisation framework by combining perturbation mechanisms, local search strategies, population management [24] and learning strategies [25]. MAs can adopt the strength of other optimisation algorithms by combining them within the same framework, which can provide better performance and overcome the weakness of other algorithms. MAs comprise evolutionary phases that aid its success in complex optimisation problems [26][27][28][29]. More specifically, the mutation, the improvement and the restart phases are primarily responsible for the stability of a MAs performance [30,31].
The differential evolution (DE) algorithm can be hybridised with the MA in the mutation phase, where DE offers a superior mutation performance across many combinatorial and continuous domains' problems [32,33]. However, the DE algorithm is subject to stagnation problems [34]. Many researchers tried to use the adaptation approach with the DE mutation operator, where two trends were mainly focusing on the control parameter adaptation strategy [35] and adaptive strategy control [36]. The importance of the mutation strategy can guide the search process to a global optimum [37]. Therefore, [38] proposed global and local neighbourhood-based mutation operators, where it can balance between the global and local search throughout the evolutionary processes. However, the mutation vectors require well-selected weights for the global and local strategy.

Related work
Many researchers have used the nature-inspired algorithms to overcome the shortcomings of the K-means algorithm, to avoid premature convergence. For example, [39] proposed a Gravitational Search Algorithm (GSA) for solving data clustering algorithm. The candidate solutions are created randomly and interact with one solution via Newton's gravity law to find optimal solutions in the problem space. Later, [40] proposed a heuristic algorithm based on the Black Hole phenomenon, where it has a simple structure, easy, and free from parameter tuning implementation.
A hybrid meta-heuristic algorithm is proposed by [41] by using Particle Swarm Optimisation and Magnetic Charge System Search algorithms for partitioning clustering problem. A dynamic shuffled differential evolution algorithm (DSDE) is proposed by [42]. The DSDE used the DE/best/1 mutation strategy and shuffled frog leaping algorithm to separate the population into two groups of the population during the evolving process.
In recent researches, authors presented data clustering algorithms by integrating the Kmeans data clustering algorithm with the population-based meta-heuristics algorithms, for example; Abdeyazdan presented an enhanced data clustering approach for that adopts the combination of the K-harmonic means algorithm (KHM) and a modified version of the Imperialist Competitive Algorithm (ICA) algorithm [43]. Gong et al. presented an improved Artificial Bee Colony clustering algorithm by enhancing the initial clustering centres selection [44]. Mustafi et al. presented an improved Genetic Algorithm (GA) data clustering algorithm to overcome the K-means clustering algorithm drawbacks [12]. Niu et al. proposed an integrated Particle Swarm Optimisers (PSOs) with the K-means algorithm [13]. Pandey et al. proposed Improved Cuckoo Search data clustering that adopts the K-means [45]. In the research of [46], the authors proposed an improved Tabu Search strategy that is integrated with the K-means clustering algorithm. More recently, The research of [19] combined the K-Harmonic Means (KHM) algorithm with PSO and an improved Cuckoo Search (ICS). They used ICS and PSO to avoid the problem of falling into the local optima.
Despite that the modified data clustering algorithms based on many evolutionary approached have better performance than other earlier algorithms, there still a problem with the weak convergence shortcoming in some evolutionary algorithms. More precisely, the exploitation and exploration balance of the evolutionary algorithms can be further improved. Therefore, in this work, we aim to improve the clustering algorithm based on an adaptive memetic differential evolution, named AMADE.

Contribution of this paper
The main objective of this paper is to address the issues discussed above by proposing an adaptive memetic differential evolution for solving data clustering problems. Specifically, the significance of our contribution is three-fold.
1. We design an adaptive memetic differential evolution algorithm for the data clustering problem. The proposed algorithm data clustering algorithm used the approach of combining MA and DE in order to solve the data clustering problem.
2. We develop an adaptive DE Mutation phase with an adaptive mutation strategy that can be used to narrow the search process through the evolutionary steps generations to the nearest possible centroids.
3. We develop a local search algorithm utilising a neighbourhood selection heuristic that seeks better centroids based on the maximum and minimum values of each attribute centroid.
More specifically, the algorithm proposed an adaptive DE mutation operator that was combined with memetic algorithm evolutionary steps. The mutation operator strengthens the search capabilities of DE through proposed DE mutation strategy. Thus, the algorithm also introduced an adaptive strategy to avoid any stagnation problem. The DE Mutation phase employs an adaptive DE/current-to-best/1 mutation strategy, to speed up the convergence speed of differential evolution algorithm under the guidance of both current and the best individuals.
The memetic improvement phase included two steps: removing the duplicate solutions and local search using an improved neighbourhood search heuristic, the modification aimed to prevent the algorithm from falling into premature convergence. The hill-climbing local search algorithm is utilised as a local search algorithm to seek for better centroids by employing an improved neighbourhood selection heuristic with first improvement strategy. The neighbourhood selection heuristic seeks better centroids based on the maximum and minimum values of each attribute centroid. The restart phase was modified to replace the new partial population with good solution generate from the discrete differential algorithm, which can keep the diversity of the population as maximum as possible.

Organization of This Paper
This research is organised as the following: Section 2 introduces the theoretical background and concepts such as standard MA and DE. In section 3, briefly explains the improved adaptive memetic DE. Section 4 presents the experimental results of the proposed algorithm. Finally, Section 5 provides the conclusion and future works.

Background
This section discusses the fundamental aspects of clustering analysis problem, differential evolution (DE) and memetic algorithms, which have been used in the proposed data clustering algorithm. Thus, this section discusses the relevant population-based approaches in the data clustering.

Cluster analysis
Data Clustering is a process of partitioning a set of n objects into some clusters K, based on a specific similarity measure. The n objects are represented by the set X = {x 1 , x 2 , . . ., x n }, the K clusters are denoted by C = {C 1 , C 2 , . . ., C K }, such that data objects in the same clusters are similar, and other data objects are dissimilar. In the data clustering problem, clusters must maintain the following three hard constraints [47]: i. Each cluster should consist of at least one object: ii. Different clusters should not have objects in common: iii. Every object must be attached to a cluster: The Data clustering problem can be represented the Eq (4): The f(X, C) is the fitness function to measure the quality of the partitions generated by the clustering method. Thus, the fitness function can be maximised or minimised depending on the similarity/dissimilarity measure used. Moreover, the fitness function should be defined for adequate partitioning. The intra-cluster Distance similarity/dissimilarity measure is one of the most popular internal metrics that is utilised to measure the quality of the clustering solution [7], as in the Eq (5): The d(O i , Z l ) represents the distance between the centre of cluster Z l and data object O i . The Euclidean distance, as in Eq (6), is one of the most famous distance functions [7]. It can measure the distance between two objects (O i and O j ) inside the same cluster.
Euclidean distance dðO i ; O j Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Furthermore, the centres Z l is can determine the mean value for all cluster objects as in Eq (7), where the number of data objects in cluster Z l is denoted by n l.

Differential evolution algorithm (DE)
DE algorithm is an effective meta-heuristic optimisation algorithm for solving continuous and combinatorial optimisation problems [48]. The algorithm starts with initialising a population. The individuals are chosen as parents for the mutation and crossover operators to generate trial offspring individuals. The mutation operation perturbed a base individual by a scaled difference vector, where the vector can consist of many random individuals selected from the population in order to produce a mutant individual. The comparison between offspring individual with the parent in fitness value will result in a new individual for the next generation. The evolution process will be terminated when satisfying a termination condition. Finally, in the last generation, the best individual will be the solution to the problem. The DE algorithm starts the evolutionary process by initialising the population with individuals in the solution space. In each generation, the individuals are selected as parents for the mutation and crossover in order to generate the trial offspring individuals. In the mutation phase, the individual is perturbed by a scaled differential vector that contains several individuals that are randomly selected in order to produce the mutant individual. The offspring individual is then compared with the parent using the fitness value, and the superior one is chosen as the new individual for the next generation. The evolutionary processes are terminated when the termination condition is satisfied, and the solution to the problem will be the best individual in the last generation.
The effectiveness of the DE in solving complicated optimisation problems depends mainly on choosing suitable mutation strategy and the related parameter values. Therefore, choosing suitable control parameter values for the DE algorithm is an essential task. Many researchers have been attracted to study the DE algorithm. For example, [35] proposed DE-PAS algorithm for selecting and incorporating a suitable adapting parameters scheme. [49] proposed a network intrusion detection based on efficient feature selection technique using decision tree algorithm and discretised differential evolution (DDE) from standard intrusion datasets. [50] presented a DE algorithm that can avoid premature convergence and improve the search quality. The population is grouped into many tribes and utilises an ensemble of different mutation and crossover strategies. They used an adaptive scheme to control the scaling factor and the crossover rate. [51] introduced a self-adaptive differential evolution algorithm (APDDE). The algorithm integrates the detecting values into two mutation strategies to produce the offspring population. [36] proposed a self-adaptive differential evolution algorithm with a hybrid mutation operator (SHDE) for parameters identification problem. In [52], researchers proposed a self-adaptive DE which can predict the control parameters based on the ensemble. [53] proposed a self-adaptive DE algorithm with discrete mutation control parameters (DMPSADE). Every individual contains its mutation control parameter, crossover control parameter and mutation strategy.

Memetic algorithms
The Memetic Algorithms (MAs) are a meta-heuristic approach that combines the problemspecific solvers with evolutionary algorithms. The problem solvers can be implemented using exact methods, approximation algorithms or local search heuristics. The hybridisation aims to accelerate the discovery of good solutions or to find the solutions that are unreachable by evolutionary algorithms or the local search methods alone. MAs have been proven successful performance for a broad range of problem domains, such as wireless sensor networks [54], Machine learning algorithms [55], scheduling problems [56], routing problems [57] and bioinformatics [58]. MAs received many names throughout the literature. Some of the alternative names are hybrid GA, Baldwinian EA, Lamarckian EA, genetic local search algorithms [59]. The MAs can combine techniques and approach from many search techniques, and most distinguished approaches from local search methods and population-based search techniques. The basic memetic algorithms template include procedures: The initialisation procedure. The initialisation procedure is responsible for creating solutions to the initial set of the population. The MAs seeks to create high-quality solutions to be in the starting point. The initialisation procedure can be done either using a local search procedure or a constructive heuristic to improve the random initial solutions.
The cooperate and improve procedures. The cooperate and Improve procedures typically rely on the selection of the solutions from the population and recombine them. Both procedures utilise the approach of a local search in the population.
The compete procedure. The Compete procedure is used in the reconstruction of the current population using the old and the new population. A steady-state replacement strategy is one of the most popular strategies that could be used when the fitness function suffers from complexity and time-consumption and could lead to faster convergence [59].
The restart procedure. The restart procedure is invoked whenever the population falls into a degenerate state. Typically, one of the strategies that could be used is to keep a part of the current population and generate the remaining part by new solutions. Another approach is to apply a heavy mutation operator; this could generate a population different from the current state in the search space. Improved adaptive memetic differential evolution. This section discusses the detailed steps of the propose AMADE algorithm along with the solution representation.

Solution representation
The optimal encoding aims to determine the data objects that belong to a particular cluster to perform optimal clustering analysis. A label-based one-dimensional array is used to represent the candidate solution in data clustering optimisation problem. Every solution representation is considered as a set of N data objects, where each cell represents a cluster number associated with that object. Fig 1 presents a  Additionally, a centroid-based representation that consists of a two-dimensional matrix is used to keep track of the positions of the cluster centroid and to be used by the local search. The matrix consists of K rows and D column, where K is the total number of the clusters and D is the total number of the attributes in the dataset. For example, in Fig 2, the dataset contains two clusters and two attributes; then the position of first cluster centroids is 4.5, 2.3, and the position of the second cluster centroids is 5.5, 7.4.

Constraint handling
The solution representation of the proposed AMADE guarantees that each data object is associated only with one cluster. An additional soft constraint is formed to prevent any duplicate solutions in the population in the improvement phase. Moreover, any possible duplicate solutions can lead through the evolutionary processes to premature stagnation. The duplicate solution is handled in the improvement phase by generating solutions randomly.

The AMADE proposed approach
In AMADE, the DE mutation operator with an adaptive strategy DE/current-to-best/1 has combined with the memetic algorithm evolutionary steps; this aims to have faster convergence speed faster by the best individual's guidance. The new individuals are compared with the target vector, which can improve the guidance of the population evolution. However, AMADE may suffer from premature convergence. To this end, the restart phase can prevent falling into premature convergence by reconstructing the population diversity by generating new solutions in the population. The improvement phase plays a key role in finding better solutions, which can also improve the quality of the solution using a proposed improvement heuristic. The pseudo-code for the proposed AMADE algorithm is shown in Fig 3.

The population initialisation phase
For keeping better diversity of the population, a random constructive method is used. The initial solutions for the proposed AMADE are randomly generated. The data points of the dataset are randomly grouped into K clusters; all centroids of each cluster are calculated using Eq (7), where K is the total number of clusters. These two steps are repeated N times to generate N random solutions, where N is the population size parameter value of the AMADE algorithm.

The recombination phase
The recombination phase employs the mating pool approach [60] in evolutionary computation with CandPoolSize size. The tournament selection operators, with selection size TourSize [61] are applied to the entire population then placed into the mating pool. Thus, a two-point

The DE Mutation phase
The DE Mutation phase employs an adaptive DE/current-to-best/1 mutation strategy, to speed up the convergence speed of DE algorithm under the guidance of both current and the best individuals. The cluster centroids are modified by the mutation phase to achieve better cluster solution, as shown in Fig 6. This is performed by using Eq (8). Where C current is the current individual centroid, C best is the best individual centroid, C rand is a random individual centroid, CurrIteration is the current iteration in AMADE algorithm, and MaxIterations is the maximum number of iterations of AMADE.
Such adaptive strategy will narrow the search process through the evolutionary steps generations to the nearest possible centroids. At the same phase, the data objects are rearranged to the closest clusters after modifying the centroids of the clusters. The new produced individual is immediately compared with the target vector in a current population, and the better individual could be retained.
In order to demonstrate the effectiveness of the adaptive DE strategy, Fig 7 shows an example of a cluster centroid of value 6.5 that is adjusted throughout 1000 iterations of the AMADE algorithm. The adaptive DE strategy provides more exploration capabilities to cluster centroid, which in the first iteration is 6.5 and is adjusted to the new centroid that is 11.5. As the algorithm reaches the maximum number of iterations, the DE strategy produces more exploitation capability to the current centroids, which in the iteration 999 is 10.1 and the new centroid is 10.1004. An improved adaptive memetic differential evolution optimization algorithms for data clustering problems

The improvement phase
The improvements phase consists of two solution quality improvement steps: the clear duplicate solutions step and the local search step. In the clear duplicate step, the algorithm ensures that the population retains better solution diversity in order to avoid any premature stagnation. At the second step, a hill-climbing local search [63] is employed on the centroid-based presentation by changing the current centroid with better cluster centroids. The hill-climbing local search algorithm, as shown in Fig 8, seeks better centroids by utilising the neighbourhood selection heuristic with a first improvement strategy [64]. The algorithm terminates when the current solution is improved.
The neighbourhood selection heuristic, as shown as pseudo-code in Fig 9 and a flowchart in Fig 10, seeks better centroids based on the maximum and minimum values of each field's centroid. The heuristic increases the centroid value with an increment step value until finding better centroid. Otherwise, the algorithm will change the search direction decreasingly to the minimum value of centroid.

The restart population phase
Once the population is having a state of degeneration, the restart procedure is employed immediately [59]. The restart strategy keeps part of the population and excludes the remaining individuals by generating new solutions. As shown Fig 11, AMADE keeps 75% of the population for the next evolutionary steps, while the remaining population is generated using a DE algorithm based on mutation strategy DE/rand/1 and the minimal number of generations, which can produce a new population with better diversity and good quality solutions.
The DE algorithm, shown in Fig 12, is applied to the solution representation with a discrete mutation operator. Each genome in the new chromosome is calculated using Eq (9). Where G rand1 , G rand2 , G rand3 is the gene in the chromosome of randomly selected individuals. The An improved adaptive memetic differential evolution optimization algorithms for data clustering problems modulus is used to ensure that the result of the equation within the number of clusters in the dataset.

Experimental setup
The performance of the proposed AMADE clustering method are investigated based on six real data datasets from the UCI repository of the machine learning databases with a variety of complexity [65], which can be download at http://archive.ics.uci.edu/ml/index.php. The datasets that been used are Wisconsin Breast Cancer, Vowel, Wine, Iris, Contraceptive Method An improved adaptive memetic differential evolution optimization algorithms for data clustering problems Choice (CMC) and Glass, as shown in Table 1. The datasets also include different complexity levels and classified from 1 to 10 levels based on the number of instances and attributes [66], where level 1 is the lowest complexity level, and level 10 represents the highest complexity level.
In order to evaluate the effectiveness of the proposed Memetic DE algorithm with proposed evolutionary phases, the AMADE performance is first compared with DE [48] with DE/best/1/ bin strategy, Hybrid DE with DE/best/1/bin strategy, GA [67] and hybrid GA algorithms, where all algorithms are applied with the same experimental setup and local search heuristic. These algorithms have the same evolutionary phases of AMADA except restart phase. The selection of these algorithms is essential to show the strength of the combination of such algorithms in MA besides the proposed adaptive mutation operator and the modified restart phase. Moreover, for further testify the performance, the AMADE is compared with recent data clustering algorithms in the literature, including K-means [9], black hole [40], age-based An improved adaptive memetic differential evolution optimization algorithms for data clustering problems particle swarm optimisation [68], dynamic shuffled differential evolution algorithm [42], the krill herd algorithm [69] and hybrid ICMPKHM [19].
The algorithm's performance is evaluated using the following criteria: • The intra-cluster distances: is an internal quality measure that measures the distance between all objects in the cluster and its centre, as defined in Eq (5). The purpose of the data clustering algorithm is to minimise the sum of intra-cluster distances which can lead to high clustering quality. The intra-cluster distance value is given as best (minimum intra-cluster distance), the average value and worst (maximum intra-cluster distance) value of objective function value among entire runs.
• The F-measure: is an external measure that compares the ground truth with the obtained clusters to calculate the similarity between them. The high percentage of the F-measure value indicates a better clustering quality. The precision and recall of cluster S j , and class R i , i, j = 1, 2, . . ., k is shown in Eq (10) and Eq (11), Where |R i | is the number of objects in class R i, and |S j | is the number of data objects in cluster S j , and L ij is the number of data objects of class R i in cluster S j . The F-measure of a class R i is defined in Eq (12). The overall F-measure is computed as the weighted average of all classes is given in Eq (13).
• The accuracy: is an external measure indicates the proportionate number of data objects that correctly placed by the predictive model to match the class (ground truth) in the data, as shown in Eq (14):

Accuracy k ð Þ ¼ number of correct data objects identified total number of data Objects ð14Þ
The parameter settings for the AMADE algorithm were independently tested on each of the six datasets for 31 times, the best, worst, average values, standard deviations and F-measure An improved adaptive memetic differential evolution optimization algorithms for data clustering problems were computed. In AMADE, the maximum number of generations is set to 1000, and 100 to DE/rand/1 in population restart phase. Accordingly, F is set to 0.7 and Cr is set to 0.9.
A Taguchi method [70,71] for the design of the experiment has been used to identify the best values of the parameters for AMADE algorithm. Five levels were considered for each factor as shown in Table 2. AMADE algorithm run for 31 times for each factor at each level was employed, and the mean of signal-to-noise (SN) ratio plot each level of the factors are shown in Fig 13. The level with the maximum SN ratio is the optimum parameter determined by Taguchi method.
According to Fig 13, the optimum value for population size is set to 20, and max generation without improvement is set to 50. The recombination mating pool size is set to 10, and the tournament selection pressure is set to 10. Last but not least, All algorithms are implemented in Oracle Java 1.8 and were run on CPU Intel Core i7 (2.4GHz) personal computer that contains 8 GB of RAM.

Experimental results and discussion
The objective function values comparison for the best solutions, average solutions and worst solutions, F-measure, standard deviation and execution time of solutions for 31 runs is shown in Table 3. Where Best, Mean and Worst are referred to the intra-cluster distances objective function values that were obtained out of 31 runs, where the smaller value is better, and the higher value of the F-measure is better. The results show that the proposed algorithm has a smaller best, average, worst and standard deviation compared with the other algorithms. For example, the Iris dataset results show that AMADE achieved 96.544 global optima whereas the best solutions of GA, DE, HyGA ad HyDE are 97.225, 97.101, 96.571 and 96.571. However, the worst, best, and average results of the solutions by HyGA and HyDE are close to AMADE on most of the datasets, but it did not perform well with the standard deviation and the results of worst solutions. Moreover, the results of F-measure of the proposed algorithm can be noticed as better than other algorithms in most datasets, except for the iris and cancer datasets which are similar to the global optimum. Furthermore, the trade-off between the quality and the time-cost problem occurred, leading to the time-cost-quality trade-off problem. The hybrid metaheuristic approaches, such as AMADE, HyDE, and HyGA, can obtain optimal solutions in reasonable execution time. In contrast, the traditional metaheuristic algorithm, such as GA and DE, do not guarantee to find the optimal solution, but they usually obtain sub-optimal, good-quality solutions in less execution time. As shown in Table 3, The traditional DE and DE algorithm achieve best execution time for all dataset, but they were unable to obtain the optimal solution for the datasets. In contrast, AMADE algorithm produced the optimal results of the intra-clusters distances and the F-measure with reasonable execution time when compared to HyDE and HyGA. For example, AMADE obtained 5532.620 for the average intra-cluster distance on CMC dataset, and 0.52107 for the F-measure in 59.485 seconds, which were the optimal results with the best execution time compared to HyDE (72.470) and HyGA (86.887). Fig 14 shows the convergence curves of the first 200 iterations on six datasets. It demonstrates that AMADE has the best convergence rate results on the six datasets with faster converge in the early iterations of the search process; later, the convergence becomes slower. The HyDE achieved the second best convergence rate results, and the HyGA scored third best results. The GA and DE algorithm produced a slow convergence rate toward the optimum intra-cluster distance on all datasets. In general, the improved memetic phases by removing the duplicated solutions along with the local search and the adaptive strategy shown the effectiveness in preventing the algorithm from falling into premature convergence.
Furthermore, Table 4 shows the result of the rankings of the mean values generated by Friedman's test based on the average and best value of intra-clusters distances. Additionally, the Friedman's test reveals the significance of the AMADE algorithm with a p-value of 0. An improved adaptive memetic differential evolution optimization algorithms for data clustering problems 000189 for the test based on the average value of intra-clusters distances, and 0.0000128 for the test based on the best value of intra-clusters distances, which are both below the significance level (α = 0.05).
The Holm's procedure is employed as a post-hoc method to detect the statistical difference between the control case (ranked first) and the other remaining cases [72]. Table 5 shows the p-value obtained by the Holm's procedure, where the rejection of the null hypothesis relies on the obtained p-value. Thus, the p-value must be less than the adjusted value of α (α/i), where i is the rank of the algorithm. Table 5 presents the adjusted p-value of Holm's procedure, and the AMADE algorithm is used as the control algorithm. Holm's procedure proves that AMADE is statistically better than DE, GA and HyGA, but the algorithm does not differ An improved adaptive memetic differential evolution optimization algorithms for data clustering problems An improved adaptive memetic differential evolution optimization algorithms for data clustering problems significantly from the HyDE algorithm. However, the results reported in Table 5 demonstrate that the proposed AMADE approach outperformed the HyDE in all of the tested datasets in all criteria. Based on the standard deviation criterion, AMADE is considered and more robust than HyDE as well as the other algorithms. Moreover, AMADE can found global optimal solutions for most of the cases. Additionally, in order to show the superiority of the AMADE algorithm among the other algorithms, Fig 15 presents the box plots of all datasets from 31 runs. It reveals that AMADE did not produce any outlier on all datasets, and the median solutions obtained by AMADE distributions are centralised. The box plots for the AMADE was thick and near the minimum intra-clusters distance values. The thickness of the box plots indicates that results obtained have less deviation of the median value, which means that the algorithm performance was stable over the 31 runs. The HyDE algorithm achieved the second best performance on Cancer, CMC, and Iris datasets, while it almost obtained the same performance of the HyGA algorithm on the Glass, Vowel, and Wine datasets. The standard DE algorithm obtained a better result than the GA algorithm on all datasets, where both GA and DE performance are weak compared with other hybrid algorithms. In general, the improved memetic phases by the restart phase along with the DE mutation phase shown the effectiveness in keeping the diversity of the population as maximum as possible during the evolutionary process, which helped to avoid the instability of the obtained results.
Furthermore, in order to validate the feasibility of the results, the centres of the clusters obtained by AMADE algorithm is shown in Tables 6-8, where all datasets with the same number of clusters are grouped in one table. The clusters centres can be used to validate the sum of intra-cluster distances given in Table 3. This could be manipulated by assigning the data objects within each dataset with the nearest clusters centres given accordingly in Tables 6-8, where the best intra-clusters distance values in Table 3 must be reached. For example, by allocating the 178 data objects in Wine dataset to the nearest centres with corresponding three cluster centres that are shown in Table 6, the best value of the sum of intra-cluster distances obtained by the AMADE algorithm on the Wine dataset, which is reported in Table 3, should be equals (16292.279). Otherwise, the best centres in Table 6 or the best values in Table 3 is invalid. This procedure can also be performed to validate other dataset's cluster centres.

Comparison between AMADE and state of the art
In order to evaluate the performance of AMADE, the algorithm results are compared with well-known algorithms, such as the black hole (BH) [40], age-based particle swarm optimization (PSOAG) [68], A dynamic shuffled differential evolution algorithm (DSDE) [42], the krill herd algorithm (IKHCA) [69], hybrid of krill herd algorithm with harmony search algorithm (H-KHA) [17] and hybrid ICMPKHM [19].
The related comparison results are presented in Table 9. The results present the average of the intra-clusters distances for the AMADE and other Algorithms on Iris, Wine, CMC, Glass, An improved adaptive memetic differential evolution optimization algorithms for data clustering problems and Cancer. The results indicate that AMADE has shown consistent performance and better result than IKHCA, ICMPKHM, PSOAG, H-KHA and BH on almost all the datasets. The AMADE achieved the second best results after the MSDE algorithm on Wine, CMC, Cancer An improved adaptive memetic differential evolution optimization algorithms for data clustering problems datasets. Thus, The AMADE algorithm obtained the second best results on Iris and Glass datasets. The results shown in Table 9 reveal that the AMADE performance is consistent across all the datasets compared to the state of art algorithms concerning the average of the intra-clusters distances. To further analyse the results in Table 9, the rankings with the compared algorithms generated by Friedman's test are shown in Table 10 based on the average function of the intra-clusters distances. Furthermore, the Friedman's test has shown a significant difference of the AMADE among the other compared algorithms, with a p-value of 0.02465 based on the average function, which is below the significance level (α = 0.05). The AMADE algorithm shares the best ranked algorithm with the DSDE algorithm [42], which uses the DE algorithm with multiple population approaches to reach the best average function of the intra-clusters distances for the best solutions. The results show that AMADE achieved the best ranking among other clustering algorithm based on the average performance function of the intra-clusters distances. The BH algorithm achieved the third best rank, and the ICMPKHM algorithm achieved the fourth rank, then the H-KHA. Lastly, PSOAG and IKHCA achieved the worst rank compared to other algorithms. The rankings generated by Friedman's test shown in Table 9 reveal that the AMADE performance is consistent compared to the state of art algorithms concerning the average of the intra-clusters distances.
Furthermore, the performance of AMADE is compared based on the computed accuracy with four algorithms that reported accuracy performance measure in their research, such as PSOAG, K-means [73], PSOAG, DSDE and IKHCA as shown in Table 11. The accuracy obtained by AMADE is competitive with the other clustering algorithm, where it reaches the optimum accuracy on CMC and cancer datasets. The IKHCA algorithm achieved best results of the accuracy on Wine, CMC, and Glass datasets, while the PSOAG algorithm achieved the best result on the Iris dataset. However, the results of the accuracy reveal the consistent performance of the AMADE algorithm based on the accuracy on all datasets, where it obtained second best result of accuracy on Glass and Wine datasets and obtained the third best result of the accuracy on the Iris dataset. An improved adaptive memetic differential evolution optimization algorithms for data clustering problems At last but not least, the performance of AMADE is compared based on the computed Fmeasure with three algorithms that have reported the F-measure external performance measure in their research, such as K-means [74], KSC-LCA [74], ICMPKHM [19] as shown in Table 12. The F-measure obtained by AMADE outperformed other clustering algorithms, where it reached the optimum F-measure value on the Iris, CMC, Cancer and Vowel datasets, while it obtained the second best results of the F-measure on Wine and Glass. The KSC-LCA algorithm achieved the best result of the F-measure on Wine dataset, and ICMPKHM algorithm achieved the best result on Glass and Cancer datasets. The results shown in Table 12 reveals the consistent performance of AMADE across all dataset based on the F-measure.

Conclusions and future work
In this work, an adaptive memetic differential evolution (AMADE) was proposed for efficient data clustering. The combination between MA and DE algorithms aimed to balance between the exploration and exploitation. The algorithm proposed an adaptive DE mutation operator and a neighbourhood selection heuristic that are combined with memetic algorithm evolutionary steps. The enhancements helped to avoid the instability of the obtained results by keeping the diversity of the population as maximum as possible during the evolutionary process. An improved adaptive memetic differential evolution optimization algorithms for data clustering problems Experiments conducted on six real-life datasets with different level of complexity have demonstrated that the AMADE showed consistent performance compared to the state of art algorithms concerning the average of the intra-clusters distances, accuracy, and F-measure validity measures. AMADE algorithm achieved the optimum result of the accuracy on CMC (45.62%) and Cancer (96.486%) datasets, and also reached the optimum result of the F-measure on Iris (90.1%), CMC (52.10%), Cancer (96.4%), and Vowel (66.20%) datasets. Moreover, future work will focus on using other data clustering objective functions to solve a variety of categorical and mixed data datasets. Additionally, future work will focus on how to associate validity measures with each other when combined in multi-objective approaches.