Multi-population Black Hole Algorithm for the problem of data clustering

The retrieval of important information from a dataset requires applying a special data mining technique known as data clustering (DC). DC classifies similar objects into a groups of similar characteristics. Clustering involves grouping the data around k-cluster centres that typically are selected randomly. Recently, the issues behind DC have called for a search for an alternative solution. Recently, a nature-based optimization algorithm named Black Hole Algorithm (BHA) was developed to address the several well-known optimization problems. The BHA is a metaheuristic (population-based) that mimics the event around the natural phenomena of black holes, whereby an individual star represents the potential solutions revolving around the solution space. The original BHA algorithm showed better performance compared to other algorithms when applied to a benchmark dataset, despite its poor exploration capability. Hence, this paper presents a multi-population version of BHA as a generalization of the BHA called MBHA wherein the performance of the algorithm is not dependent on the best-found solution but a set of generated best solutions. The method formulated was subjected to testing using a set of nine widespread and popular benchmark test functions. The ensuing experimental outcomes indicated the highly precise results generated by the method compared to BHA and comparable algorithms in the study, as well as excellent robustness. Furthermore, the proposed MBHA achieved a high rate of convergence on six real datasets (collected from the UCL machine learning lab), making it suitable for DC problems. Lastly, the evaluations conclusively indicated the appropriateness of the proposed algorithm to resolve DC issues.


Introduction
The past few decades have seen various nature-inspired algorithms being highlighted to resolve numerical optimization issues. These algorithms are key players in unravelling a multitude of engineering optimization problems due to worldwide investigation and their exploitability. They are characterised by their mimicry of living organisms' behaviour in nature, like objects and their respective centroids are minimized. By clustering, objects within a cluster should have as much similarity as possible while being significantly different from objects in other clusters. In other words, DC can be viewed as an optimization problem where the objective is to partition a given set of data points into a fixed number of clusters such that the within-cluster similarity is maximized and the between-cluster similarity is minimized. Among the common approaches to solving DC, problems are to formulate it as a meta-heuristic optimization problem [41][42][43][44][45][46]. The study by [27] recently devised one of the meta-heuristic optimization methods known as a "black hole," which replicates the natural action of a black hole (BH) of drawing in neighbouring stars. The concept of BHA and its interaction with the neighbouring stars formed the basis of the BHA algorithm. In this regard, the work presented by [27] has flaws in terms of exploration as the process of obtaining an optimal resolution necessitates too many reiterations. The BHA and its enhanced versions have been utilised to tackle several well known optimization problems recently [47][48][49][50][51][52][53][54][55][56][57][58][59][60][61].
Recently, several metaheuristics have been enhanced by incorporating a multi-swarm or multi-population approach, including Genetic Algorithm (GA) [62], Artificial Bee Colony (ABC) [63], and Particle Swarm Optimizer (PSO) [15,[64][65][66], and Nomadic People Optimizer (NPO) [33] due to their capability to use different populations with their parameters set and they can simultaneously implement search space. As a result, they have significantly enhanced the performance of the original metaheuristic [67,68]. This paper proposed multiple BHA optimization as a generalization to BHA optimization, in which the algorithm no longer depended on one best resolution. Instead, a set of best solutions were generated and called MBHA, which was maintained for some time in the search process. Furthermore, the algorithm's objective function was replaced with an objective function of higher effectiveness to resolve the clustering issue. Additionally, it was also compared with the original BHA algorithm according to several datasets.
The rest of this article is organised as follows: The Section 2 will focus on some earlier reported data clustering methods, while Sections 3 and 4 focus on BHA and the proposed MBHA, respectively. Finally, Section 5 summarizes the experimental results, while Section 6 summarises the work.

Background
This section aims to offer an overview of the data clustering optimization problem and the black hole optimization algorithm. First, the section explains data clustering as an optimization problem, providing the necessary mathematical formulation. Additionally, the section presents a review of the most significant related works. The second subsection explains the original version of Black Hole Algorithm (BHA), and discusses its advantages and drawbacks.

The problem of data clustering
Clustering is a crucial approach to unsupervised data classification that involves grouping a set of vectors or patterns (such as data items, observations, or feature vectors) in a multi-dimensional space [69][70][71]. The process of DC is determined by the dataset categorization concept using a specific number of clusters while reducing the intra-object distance within each cluster. The rearrangement of a given set of data patterns is referred to as cluster analysis; it is usually represented by one of two things: 1) a vector of measurements; or 2) a point in a multi-dimensional space. The procedure is conducted to create clusters that are differentiated by similarity attributes [72]. Some of the common application areas of DC are image processing, analysis of medical images, as well as statistical data analysis. They are also useful in various science and engineering fields and are sometimes used interchangeably with statistical data analysis. The differences across clusters can be attributed to their sizes, shapes, and densities, as seen in Fig 1. Nevertheless, noise present in the data presented may pose a challenge for cluster detection, whereby the ideal cluster is fundamentally designated as "a set of points that is compact and solitary. Although humans are commonly ascribed to their cluster seeking proficiency in probably three dimensions, automatic algorithms remain as the go-to for high-dimensional data." This fact, alongside the undesignated number of clusters that are yet to be described for a provided dataset, has consistently generated thousands of clustering algorithms underlined in publications [73]. Meanwhile, the learning task can be described in pattern recognition, in which the data analysis section is commonly linked with predictive modelling. In this case, the training data is allocated to predict the unknown test data behaviour. Assessment of the data similarity may require the use of "distance measures; the problem may be designed thus: given N records data, each record is assigned to only one of the K clusters. After that, clustering is done using several criteria that serve as the process objective function (OF). The minimizing of the sum of squared ED between each record and the center of the related cluster" is one of the commonly observed features. This is shown below.
where kQ i −Z j k is the Euclidian Distance (ED) between a data record O i and the cluster center Z j . N and K are the numbers of data records and the number of clusters, respectively. Combining a nature-inspired optimization algorithm with a clustering algorithm has led to the generation of optimal solutions. The study by [74] has displayed the method of adaptive time-dependent transporter ant for clustering (ATTA-C), which underlines alterations to the standard "Ant Colony Optimizer (ACO)" Ant-based clustering algorithm. It aims to subject high dissimilarities to a penalty, enhance the spatial separation between clusters, and facilitate clustering procedures. Achieving this requires the calculation of the fitness value for each clustering solution, which is carried out using a neighbourhood function (NF). Meanwhile, the study [75] has underlined a novel Particle Swarm Optimizer (PSO) approach for clustering issues, which is implementable in the case of a known or unknown number of clusters. The algorithm is termed CPSO and proceeds according to the gbest neighborhood topology, encoding cluster centroids in particles and possibly generating new partitions during optimization. This occurs either by the removal or splitting of these clusters, until the allocated number of clusters is yielded.
Furthermore, an improved version of the Firefly Algorithm (FA) was proposed by [76] for a given dataset, in which the FA is employed and implemented for the training set to obtain the cluster centre via random selection of 75% of the dataset provided. Meanwhile, the remaining 25% dataset is termed a test dataset and utilized to investigate the FA algorithm performance [77]. The Krill Herd Algorithm (KHA) is mostly used to display a simulated herding pattern of each krill individual. The density-based approach utilized allows the discovery of clusters, subsequently undermining the region of adequately high density into clusters of krill individuals arbitrarily shaped in the climate. The objective goal of the krill movement is the minimum distance between individual krill from the food source and highly-dense herds. That is considered via foraging movement and random diffusion. In the case of a density-based cluster, it can be described as a set of density-linked objects of maximum concerning density-reach capability and noisy objects. The study [44] has previously suggested an artificial bee colony (ABC) clustering approach be subjected to categorical data. A one-step k-modes procedure is first developed for this particular approach before it is incorporated with the ABC to yield a categorical data cluster. Meanwhile, the study by [78] introduced C-ESA as a hybridization of the Kmeans clustering algorithm and Elephant Search Algorithm (ESA) for data clustering and obtaining the best centroid location and clustering precision enhancement.
In [79], a map/reduce programming for the ABC algorithm has been designed, capable of configuring and incorporating data in a multi-node environment. The ABC allowed the speediest completion time during execution, displaying its high efficiency for all types of data due to the parallelism attribute it offers. It also provides the amalgamation of local and global search techniques to achieve a trade-off between exploration and exploitation capabilities in obtaining optimized clusters. Similarly, the designed map/reduce programming utilizing ABC mechanisms is incorporated in a single node and multi-mode Hadoop platform, whereby the mapper phase generates the best fitness value by mimicking the behaviour of the employed bee. Meanwhile, the reducer phase achieves the probability value for cluster optimization by mimicking the onlooker and employed bees. The resulting experimental outcomes have predicted consecutively run times of varying dataset sizes in single-node and multi-node climates. Upon evaluating the performance displayed by the ABC scheme alongside the conventional Differential Evolution (DE) and PSO schemes, the ABC method was found to show superior results for optimal cluster selection compared to the remaining options. Furthermore, it also minimized the time for execution and errors in classification in the optimal cluster selection for multinode Hadoop cluster architecture.
Meanwhile, fresh heuristic gravitational-based for data clustering has been described by [80], which answers to the excess centroid movement. Owing to the excess of centroid velocity history in the gravitational clustering algorithm, this serves as a way of improving the balance between exploration and exploitation capabilities. The technique includes an initialization phase that uses the variance and median approach so to avoid random initialisation effects. Following that, the centroid's accumulated velocity history is removed, leaving only the force of the data points in the cluster associated with the centroid to influence its position throughout any iteration.
Besides, an alternative clustering method that is effective and superior shown by [81] has opted for the application of a nature-inspired krill herd algorithm. The problem is translated into an optimization search problem via objective function minimisation to distinguish the optimum centre of each cluster. Then, multiple real and synthetic databases are reviewed, with comparison studies undertaken to elucidate the purpose of the ESKH-C technique. The technique is specifically implemented to attain quality clustering using dissimilar dimensional real data and synthetic databases alike. The predicted outcome of confidence results from the simulation studies also indicated that the technique could group optimal cluster groups having different data shapes, sizes, dimensions, and densities. In [82], modified Bee Colony Optimization (MBCO) has been implemented, with its hybridization with k-means serving as a way of its application to data clustering. The technique is synonymous with bees' traits of forgiveness and a fair chance, which is seen for trustworthy bees or their opposite alike. It is also associated with the probability-based selection (Pbselection) approach that allocates unassigned data points in every iteration. The paper by [83] has revealed a semi-supervised K clustering framework, whereby a K-means clustering framework is initially used for the gene data. Following this, an enhanced semi-supervised K means clustering is implemented for greedy iteration to identify the K mean clustering and obtain improved outcomes. Simulations have subsequently proven that a global semi-supervised K clustering algorithm offers superior capacity for optimization and cluster effect in comparison with MDO algorithm.
Overcoming the issue of local optimum in K-Means also resulted in [84], in which a new clustering framework is designed via hybridized Crow Search Optimization (CSA). A novel population-based metaheuristic optimization algorithm is rooted in the crows' intelligent behaviour. Similarly, a K-Means clustering algorithm called CSAK means has also been suggested, whereas [43] has recently designed an Elephant Herding Optimization suited for clustering tasks. In this method, intra-cluster distance and cost function are reduced.

Black Hole Algorithm
Based on the black hole phenomena, the BHA is based on the core premise of an expanse of space housing a large volume of mass. The mass is concentrated within, making it impossible for any adjacent object to escape its gravitational pull. If one were to fall victim to the event, one would be obliterated from the cosmos, including light. The method is made up of two parts: 1) star movement and 2) star re-initialization upon entering the D-dimensional hypersphere around the BH (i.e., the event horizon). It functions as follows: the first step is the initialization of the N+1 stars, x i 2R D , i = 1,. . .,N+1 in the search space, where N = population size. The best value after subjection to a fitness evaluation is then recognized as the black hole x BH . Because it is known to be static, no movement is visible until other stars reach a higher resolution. As a result, the number of individuals searching for the best value is equal to N, and in each generation, a star is shifting towards the BH as seen in the following equation [27]: where rand is a random number in the range [0,1].
Furthermore, the BHA suggests that a star that comes too close to the BHA and passes through the event horizon would be removed. The following equation describes the radius of the event horizon (R) [27]: where f i and f BH are the BH's and i th star's fitness values, respectively. N represents the number of stars considered as the candidate solutions. When R is greater than the distance between a potential solution and the BH (the best solution), the related candidate is automatically collapsed, causing the formation of a new possible solution that is distributed arbitrarily over the search space. The BHA is characterized by a simple structure requiring no parameters and can be easily implemented. Compared to the other heuristics, the BHA converges to the global optimum in all iterations, unlike the other heuristics that can be trapped in locally optimal solutions [27,85]. Although using BHA as a clustering method is associated with outstanding results, it has drawbacks due to a lack of balance between exploration and exploitation capabilities. Finding a better solution than the existing BHA will alter the direction of a star, thereby changing the star's orientation into a new BH. Furthermore, the event horizon must be conceptualized due to the stars' possible rapid convergence for the solution space to be absorbed by the BH. This problem is caused by the lack of exploration capabilities by the BHA. It does not, however, provide intensified processes for exploration or information collection regarding previously found solutions; instead, it is just a restart approach that is applied to each star [86].

Multi-population Black Hole Algorithm
The weakness in the exploration capability of the Black Hole Algorithm (BHA) stems from its low diversity population. The algorithm tends to converge too quickly to local optima, which limits its ability to explore the search space and find global optima [87]. Therefore, in case of the exploitation capabilities are being performed more than the exploration capabilities, the chances of being trapped in a local optimum are increased. In this paper, an enhanced version of the BHA algorithm was proposed and called the "Multi-Population Black Hole (MBH) Algorithm" for the problem of data clustering. MBHA is based on the original BHA algorithm but uses multiple populations instead of a single one. Each population comprises several candidate solutions (stars) that undergo random generation in the search space. Then, the populations are initialised and each of their fitness values is assessed, whereby the best candidate having the best fitness value is chosen as the black hole. At the same time, the rest reverts to becoming normal stars. As the black hole can absorb stars around it, such a process of star absorption occurs after the black hole and stars are initialised, at which the stars move. The absorption process has been formulated as seen below: where x i (t+1) and x i (t) are the location of the i th star at iteration t and t+1, x BH is the location of the black hole in the search space, c is a constant, rand is a random number in the interval [0, 1], and N is the number of stars (candidate solutions) in the population. The constant c is utilized to restrict solutions scattering in the space, as well as to yield a higher convergence speed for the algorithm.
While running the algorithm, a star (or the BH) in a population may arrive at a location offering lesser cost compared to the current black hole or not reach it. This results in the concept of Search Counter (SC), which defines the number of times a population evolves without finding an improved fitness value. Therefore, if a star reaches a better location, there will be a probability of generating a new star for that population (prob generating_star ), and this probability is formulated as follows: where SC max is the maximum value of SC.
After checking the probability of generating a new star, the SC will be reset to zero. This probability helps the population that loses many stars due to the cessation of evolution for some time to acquire new stars and give them a longer life span. A population loses some of its stars due to crossing the event horizon in case of the limitations of a black hole in space shaped as a sphere. The black hole will suck in every star that ventures into its event horizon, whereby every star death is characterized by a new replacement star of probability (prob replace ) that is arbitrarily distributed in the search space. The prob replace is formulated as prob generating_star , which will help the progressing population to keep its number of stars as large as possible. The calculation for the radius of the event horizon in the BHA algorithm is done using Eq (3).
A population must be omitted if the number of its stars becomes less than the minimum allowed a number of stars in a population. At each iteration, there will be a probability of generating a new population (prob generating_population ), which will help to explore the entire search space and avoid the local minima at a minimum number of iterations (speed up the convergence to global optima in early iterations).
where rand is a random number in the interval [0.1]. The solutions of the new population are generated in two ways: 1) arbitrarily in the search space, and 2) arbitrarily chosen from other populations. The ratio r g is used to mix between the two ways and is formulated as follows: where itr is the iteration of generating the new population and max iterations refers to the total number of iterations. Therefore, the search process during the early iterations is considered to be a global search (r g ) is small, and the solutions are arbitrarily generated in the search space.
As the iterations continue, it becomes a local search (r g ) is become larger and the solutions are taken from other populations. Note that the value of r g can be also selected as a constant. Thus, to generate a new population, there are two cases: if r g is less than 50% of the total number of iterations then generate a new random population, otherwise, generate the population based on the position of the global best black hole (BH G ) as shown in the following equations: Pop ðPÞ r g � 0:5 then genearet a random population otherwise; generate based on BH G via eq ð9Þ ð8Þ ( PopðPÞ:X i ¼ PopðPÞ:X i þ ðBH G À Popðr 1 Þ:X r 2 Þ * rand ð9Þ where X i represents a new star in the population P, while r 1 and r 2 represent a randomly selected population, and a randomly selected star from that population, and rand is a random number in the range [0,1]. This work can overcome BH's weaknesses and make a good balance between global search and local search. The key processes for the enhanced BHA algorithm are subsequently summarised using the following pseudocode in Fig 2, while the flowchart is given in Fig 3.

Results and discussion
MBHA performance was assessed by carrying out two sets of experiments. Firstly, several mathematical objective functions with multiple local minima were used to further evaluate the developed algorithm and to compare it with the original BHA and other related works. Secondly, MBHA algorithm has been validated and tested based on six benchmark datasets, and compared to other powerful state-of-art algorithms.

Evaluation on benchmark test functions
To highlight MBHA for superior exploration compared to the standard BH, further verification has been carried out via a set of multi-model types of objective functions in a multi-  Table 1 while the parameter setting is in Table 2. These parameter settings were utilized in their default values as specified in the original versions. Moreover, each function is also associated with the generation of the convergence curve of the search, which is then differentiated from the actual BHA algorithm. The The performance of the new MBHA was benchmarked against 9 popular metaheuristics which are Genetic Algorithm (GA) [88], Arterial Bee Colony (ABC) algorithm [89], Particle Swarm Optimization (PSO) [90], Levy Firefly Algorithm (LFFA) [91], Grey Wolf Optimizer (GWO) [92], Ant Colony Optimization Algorithm (ACO) [30], Bat algorithm (BA) [93], Flower Pollination Algorithm (FPA) [94], and Blackhole (BH) [27]. The assessments and experiments were carried out accordingly, with MBHA and BHA being subjected to 30 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi different runs each. As a result, the best mean, error rate, and standard deviation were calculated, as seen in Table 3 for each algorithm. Although the statistical results presented in Table 3 provides a first insight into the performance of MBHA, a Wilcoxon Signed-Rank Test pair-wise statistical test with a statistical significance value / = 0.05 is utilized for a better comparison. A Wilcoxon signed rank test is needed to compare the performance of MBHA against standard BHA and PSO individually.
The null hypothesis (H 0 ) for Wilcoxon signed Rank test is that there is no significant median difference between the mean pair of samples. The results are compared to other methods at a 95% level of confident. Here, if the Wilcoxon statistic is less or equal to the alpha (α = 0.05), then H 0 will be rejected. To perform the statistical calculations, the SPSS statistics Software Version. In Table 4, the statistical results of BHA, PSO, GA, ABC and ACO algorithms compared to MBHA are given.
The convergence curve was also generated for the searching pattern of the 6 functions and compared with that of the original BHA. As can be seen from Figs 4 to 9 the MBHA has shown faster convergence curves. For the test problems, MBHA showed a better fitness value than BHA during all the optimization processes. This means the MBHA is more efficient than BHA and more suitable for the optimization problem.

Evaluation on benchmark dataset
To ensure a fair comparison with existing methods, the same datasets used in the original version of the black hole algorithm and related works were utilized. Using different datasets would make it difficult to compare performance. Although testing on multiple datasets is important, consistency in dataset selection was prioritized. Six datasets were utilised to evaluate the performance of the suggested algorithm for data clustering: Iris, Wine, Glass, Cancer, Contraceptive Method Choice (CMC), and Vowel. Table 5 has outlined each of their specific attributes, whereby the datasets were all obtained from the UCI ML laboratory.

PLOS ONE
2. Wine dataset. For this dataset, the quality of the wine was depicted according to its physicochemical attributes, as they were originally harvested from an identical Italian region but of 3 different cultivars. The three wine types were associated with 178 instances each, with 13 numeric features representing the number of 13 components in each wine type.
3. CMC dataset. This dataset is a subset of the 1987 National Contraceptive Prevalence Survey carried out in Indonesia. The sample size was made up of married women who, during the interview period, were either not pregnant or unknown of their pregnancy. It underlined the problem of anticipating recent choices of contraceptive techniques (i.e. no use, shortterm use, or long-term use) per a woman's socioeconomic and demographic attributes.

Cancer dataset.
This dataset is representative of the Wisconsin breast cancer database; it is made up of 9 components with 683 instances; the 9 components are "Clump Thickness, Cell Size Uniformity, Cell Shape Uniformity, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nuclei, and Mitoses." Every instance was attributed to being possibly benign or malignant, respectively.

PLOS ONE
5. Glass dataset. This dataset is made up of 214 objects with 9 features that included "refractive index, silicon, potassium, sodium, calcium, magnesium, aluminium, barium, and iron." Meanwhile, six types of glass were used in the data sampling process; these are "nonfloat processed building windows, float processed building windows, containers, tableware, float-processed vehicle windows, and headlamps." 6. Vowel dataset. This dataset is made up of 871 Indian Telugu vowel sounds; the dataset also has 3 attributes that correspond to the 1st, 2nd, and 3rd vowel frequencies, as well as 6 overlapping classes.
The comparison stage was conducted by calculating four statistical values after executing the algorithms for 30 run times; the output was the sum of intra-cluster distances. These four values are (Best, Average, Worst, and Standard deviation). Additionally, all algorithms have been compared based on the value of the error rate. These two measurements can be defined as follows: 1. The sum of the distances between the clusters as a measure of internal quality: The calculation and summing up of the intra-cluster distances between the data centre and each data object is shown in Eq 1. A higher cluster quality is typically correlated to a smaller sum of intra-cluster distances, in which the sum of the distances between the clusters was one of the fitness components evaluated in this study.
2. Error Rate (ER) as an external quality measure: The equation below displays the percentage of misplaced data objects:

ER ¼ Number of misplaced objects total number of objects within dataset � 100 ð10Þ
Several metaheuristic methods were compared with the performance of the proposed algorithm, such as K-means [72], PSO [35], ACO [95], KH [77], GSA [41], BB-BC [41], CS [96], TS [97], and BHA [27]. In addition, MBHA was also subject to a comparison with 9 of the recently modified hybrid meta-heuristics reported in the literature; these metaheuristics included: K-means++ [98], IKH [98], BSF-ABC [99], ACPSO [66], H-KHA [81], K-MCI, MCI [100], and NM-PSO, K-NM-PSO [99]. The results of the comparison based on the standard meta-heuristics clustering frameworks and the modified hybrid meta-heuristic for a better benchmarking of the MBHA are shown in Tables 6 and 8. A summary of error rate and intra-cluster distances is shown in Table 6. Each of the algorithms was implemented for 30 runs, and after the simulation runs, the values for the best, average, worst, standard deviation and error rate were for each algorithm. In the Table, the values in bold were the best-derived values using algorithms for each dataset. The results of the experiments showed that MBHA outperformed BHA and K-means. Further comparisons showed that the suggested technique achieved the least standard deviation compared to the other algorithms, implying that the MBHA is always at its minimum value.
Furthermore, the Iris dataset depicted MBHA algorithms having a convergence of 96.522 for each run. In contrast, the wine dataset indicated that the MBHA revealed the superior solution for worst 16 Similarly, the MBHA also obtained the best optimum value for the vowel dataset, which was 148,941.00. Therefore, it could be conclusively stated that the MBHA algorithm had achieved the near-best value in all runs and reassured its capacity to yield superior optimal solutions, notwithstanding a small standard deviation in a minimum number of iterations.
The algorithms were further compared statistically to check for significant differences in their performances; the statistical comparison was made using the Friedman and Iman-Davenport tests. Table 7 presents the performance of the algorithms based on the employed statistical tests. Table 8 compared the average instar-cluster distances and error rate of various clustering algorithms; MBHA yielded the best performance and conclusively revealed superior performance for all six datasets. The Iris dataset resulted in a 0.00010 standard deviation for the proposed algorithm, which was a value that was remarkably less in comparison with the remaining clustering algorithms. However, its best solution of 96.51300 and worst solution of 96.53200 was both superior compared to the remaining. For the Wine dataset, the proposed MBHA algorithm obtained an average value of 16,293.400 outperforming the rest of the algorithms, excluding ACPSO. Meanwhile, the CMC dataset obtained exceedingly superior performance for the proposed algorithm; the worst solution of 5531.220 was relatively better compared to the remaining algorithms by a wide margin.
The Cancer dataset revealed that the proposed MBHA depicted the best solution of 2961.950 and an average solution of 2963.900. Its standard deviation was 0.0072 and  reassuringly superior compared to K-means++, IKH, BSF-ABC, ACPSO, H-KHA, K-MCI, NM-PSO, K-NM-PSO, and MCI. In contrast, the Glass dataset obtained the best solution of 199.860 using the K-MCI algorithm, while the final dataset of Vowel yielded 148,943.00 of the best average solution by MBH. Hence, this conclusively highlighted the effectiveness of MBHA to resolve complex optimization problems, simply due to the best results generated by almost all of the datasets and upon comparison with the remaining comparative algorithms. The outcomes were specifically achieved by adding the element of new operators.

Conclusion
Black Hole Algorithm (BHA) is a newly developed optimization method that offers a promising solution for addressing complex global optimization problems. However, one of the limitations of the BHA algorithm is that the lack of balancing between the exploration and exploitation, which increases the chances of trapping in local minima, thereby preventing it from finding the optimal solution. To overcome this issue, an enhanced version of the BHA based on a new multi-population architecture, has been employed in this work by applying effective enhancements including a global exploration operator that facilitates the rapid convergence of the algorithm towards optimal solutions. The proposed algorithm is called "Multi-Population Black Hole Algorithm (MBHA)". Simulation results demonstrate that the proposed algorithm is able to significantly reduce computation time and achieve its set objectives, thereby prompting further evaluation on data clustering problems. Furthermore, the outcomes confirm the suitability of the proposed algorithm for resolving clustering problems as compared with previous reports. Despite the numerous advantages of the MBHA algorithm, several aspects require further elucidation and investigation in future research. Firstly, the algorithm was only benchmarked on nine test functions, thus necessitating the use of more benchmark problems to provide a comprehensive assessment of its capabilities. Secondly, the issue of number of populations and their sizes presents a fascinating research area that deserves in-depth exploration. Lastly, improving the convergence of the MBHA algorithm represents a crucial research topic that warrants further investigation.
In conclusion, the proposed MBHA represents an effective optimization method that offers a viable alternative for solving complex global optimization problems. Nevertheless, further research is necessary to investigate the ability of the algorithm to handle different hard optimization problems, such as, feature selection, hyperparameters tuning for Support Vector Machine (SVM), and training artificial neural networks (ANN).