Figures
Abstract
Traditional neural networks used gradient descent methods to train the network structure, which cannot handle complex optimization problems. We proposed an improved grey wolf optimizer (SGWO) to explore a better network structure. GWO was improved by using circle population initialization, information interaction mechanism and adaptive position update to enhance the search performance of the algorithm. SGWO was applied to optimize Elman network structure, and a new prediction method (SGWO-Elman) was proposed. The convergence of SGWO was analyzed by mathematical theory, and the optimization ability of SGWO and the prediction performance of SGWO-Elman were examined using comparative experiments. The results show: (1) the global convergence probability of SGWO was 1, and its process was a finite homogeneous Markov chain with an absorption state; (2) SGWO not only has better optimization performance when solving complex functions of different dimensions, but also when applied to Elman for parameter optimization, SGWO can significantly optimize the network structure and SGWO-Elman has accurate prediction performance.
Citation: Liu W, Sun J, Liu G, Fu S, Liu M, Zhu Y, et al. (2023) Improved GWO and its application in parameter optimization of Elman neural network. PLoS ONE 18(7): e0288071. https://doi.org/10.1371/journal.pone.0288071
Editor: Bilal Alatas, Firat Universitesi, TURKEY
Received: January 30, 2023; Accepted: June 17, 2023; Published: July 7, 2023
Copyright: © 2023 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by the National Natural Science Foundation of China from Guangwei Liu under Grant numbers 51974144, Liaoning Provincial Department of Education Project from Wei Liu under Grant numbers LJKZ0340, and the discipline innovation team of Liaoning Technical University from Guangwei Liu and Wei Liu under Grant numbers LNTU20TD-01, LNTU20TD-07. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Elman neural network is a typical local regression network [1]. It has been widely used in the fields of image recognition, fault detection, and big data prediction because of its strong memory capacity and high computational efficiency [2]. The performance of Elman is largely influenced by its training process. Therefore, exploring a high-quality training process has become a key problem to solve in neural network research [3].
In the early 1990s, gradient descent and stochastic methods were the two main Elman training methods [4]. However, gradient descent methods have three main drawbacks [5]: difficulty in finding the global optimal solution, slow convergence, and high dependence on the initial parameters. Similarly, stochastic methods can also weaken the training ability by initializing the parameters. As a result, in the late 1990s, some studies constructed a neural network as a nonlinear optimization model to replace the original linear model [6]. Although this approach avoids computing gradient information, it is not applicable when the dimension exceeds the memory range. Accordingly, starting in 2000, some researchers considered the training network structure as an optimization problem of finding the optimal parameters in a finite space [7]. Some scholars solved this optimization problem by heuristic methods [8]. However, the method needed to increase search space when traversing the set of parameters, which improved the time complexity of the algorithm [9].
To explore better network structures and improve the performance of neural networks, metaheuristic algorithms have become reliable alternatives [10]. Compared to gradient descent methods, metaheuristic algorithms show higher efficiency in avoiding local extremum. These algorithms shift from local search to global search, making them more suitable for global optimization. Therefore, researchers have used metaheuristic algorithms in Elman as an optimization strategy for network structures, and a series of more meaningful results have been achieved so far. For example, Zhang et al. used an improved arithmetic optimizer (IAO) to train the Elman network structure [11]; For the soil salinity prediction problem, the sine cosine algorithm (SCA) was applied to adjusting the parameters of Elman [12], and the experimental results demonstrated that SCA could improve the prediction efficiency of Elman; Some researchers used the particle swarm optimization (PSO) algorithm to optimize Elman parameters and PSO-Elman based on load prediction model [13], compaction density evaluation model [14] and parameter evaluation model were constructed [15]; Metaheuristic algorithms were combined for adjusting the weights and thresholds of Elman. For example, the ant colony algorithm (ACO) and genetic algorithm (GA) were combined to form AGA-Elman [16]; SUN et al. developed an Elman prediction model based on a whale optimization algorithm (WOA) [17]. The experimental results proved that WOA-Elman has good engineering utility the porosity prediction. In addition, WOA-Elman also played an important role in weather prediction [18] and landslide probability prediction [19].
Although various metaheuristic algorithms have been deployed and studied to train Elman, local extremum still exists. The grey wolf optimizer [20] (GWO) is a recently proposed metaheuristic algorithm. GWO is inspired by the wolves hierarchy and the hunting process. GWO has three leaders who are responsible for guiding the wolves to attack, delivering attack information and leading the pack to encircle [21]. During the iterative, the three wolves continuously update their positions and thus search for the global optimum. Due to its few parameters, easy implementation and strong convergence, GWO has shown excellent performance in solving high-dimensional optimization problems [22]. However, the global search capability of GWO is still poor, and it is easy to fall into local extremes. However, the well-known No Free Lunch Theorem [23] states that there is no universal metaheuristic algorithm that can solve all optimization problems. Therefore, our research aims to focus on two points. First, to propose a more efficient improved grey wolf optimizer based on the algorithm characteristics. Second, to explore a better method for training network structures based on the improved grey wolf optimizer.
Therefore, we propose an Elman training method based on the improved grey wolf optimizer (SGWO). SGWO introduces three strategies into the wolf hunting process: circle chaotic mapping, information interaction mechanism and the adaptive position update strategy. We use circle chaotic mapping to increase the population diversity; In the information interaction machine, the head wolf position is perturbed by the Cauchy variation to jump out of the local optimum, and the information transfer between wolves is enhanced by the golden sine algorithm, thus accelerating the convergence of SGWO; Meanwhile, the adaptive position update strategy is used to adjust the search range autonomously, enabling SGWO to balance the global and local searches. In addition, we innovatively introduce the Markov process and probabilistic analysis to demonstrate the convergence performance of SGWO. Ablation experiments based on three strategies are also conducted and SGWO is compared with seven optimization algorithms to analyze the optimization performance of the improved grey wolf optimizer. Based on this, we incorporate SGWO into the Elman training process and construct an SGWO-Elman prediction model. The SGWO-Elman is also compared with three types of algorithms, including Elman neural network based on other optimization algorithms, other neural networks and other neural networks based on SGWO to verify the prediction ability of SGWO-Elman model for complex problems.
The rest of the paper is organized as follows. Metaheuristic algorithms classification and variants of GWO are mentioned in Section 2. Section 3 gives a brief description of the grey wolf optimizer. The improved grey wolf optimizer (SGWO) is introduced and proved in Section 4. Section 5 proposes and describes an Elman training method based on SGWO. Experiments and results are discussed in Section 6. Finally, we conclude with a summary of the current work and future research efforts.
2. Related work
Compared with traditional optimization algorithms, optimization techniques that mimic natural phenomena have dominated the field of optimization. These are also known as metaheuristic algorithms. Metaheuristic algorithms are mainly divided into three categories: evolutionary algorithms (EA), physics-based algorithms, and swarm intelligence (SI) based algorithms [24].
EA mimics the rules of nature evolves. The genetic algorithm (GA) [25] is very popular in EA. In GA, the initial solution is randomly generated and continuously updated through crossover and mutation operations. GA will find the optimal solution by iteration finally. Under the evolution of GA algorithms, many studies have proposed new algorithms, such as differential evolution (DE) [26], covariance matrix adaptation evolution strategy (CMAES) [27], evolutionary programming (EP) [28], etc.
Physics based algorithms are inspired by the physical world, such as gravity, explosions, and so on. Among them, gravitational local search (GLS) [29], multi-verse optimization algorithm (MVO) [30], sine cosine optimization algorithm (SCA) [12], and atom search optimization algorithm (ASO) [31] are classic physics based on algorithms. In GLS, the searched individuals are viewed as objects moving in space, attracting each other through gravitational interaction. Gravity forces individuals to move towards the individual with the greatest mass, gradually approaching the optimal solution.
SI is inspired by the collective behavior and nature rules of bees or herds. SI includes moth-flame optimization algorithm (MFO) [32,33], white shark optimizer (WSO) [34], whale optimization algorithm (WOA) [17], sparrow search optimization algorithm (SSA) [35], and others. In SI, the particle swarm optimization (PSO) [13] is the most popular algorithm, which updates the location of birds to find the most food.
Grey Wolf Optimizer [20] is a recently proposed metaheuristic algorithm. GWO is widely used to solve optimization problems due to its advantages such as fewer parameters and fast convergence speed. However, GWO still has poor global search ability and is easy to fall into local extremes. Recently, there have been many studies to improve the GWO algorithm in different ways. Some studies have proposed population diversity strategies to balance initial population distribution. Some works have focused on adjusting the parameters of GWO, i.e., A and C. The other works have adjusted the location update strategies to improve GWO performance. Another aspect of related studies to this work was combining GWO algorithm with other existing metaheuristic algorithms. Although SGWO algorithm is fundamentally different from previous methods, we still need to discuss the classification of metaheuristic algorithms in detail.
Modifications the random position of the initial population can balance spatial distribution of the population. Chaotic mapping strategy and opposition learning strategy were widely used in initial population. In the chaotic mapping strategy, Luo et al. [21] have proposed tent-line coupled chaotic mapping to initialize the population, which ensured that the GWO algorithm generated diverse populations; Another improved GWO algorithm used a two-dimensional chaotic map to initialize the population [22]; Zhao et al. have generated GWO initial population through Chebyshev chaotic mapping, ensuring the diversity of the initial population and enhancing global search ability of GWO [36]; In addition, some studies have integrated chaotic maps; Xu et al. have applied integrated mapping systems (CLS) to GWO to increase its population diversity and accelerate the convergence of the algorithm [37]. Besides chaotic mapping, the pseudo-antithesis number generation method based on opposition learning strategy was used to improve the distribution of population [38]; Another
improved GWO also generate its opposition wolf by lens imaging learning strategy [39]. These population diversity strategies are successful in balancing initial population distribution and improving algorithm’s performance.
Some algorithms have improved GWO performance by modifying and adjusting parameters. Song et al. [40] proposed IGWO, which enhanced exploration by modifying linear convergence factor to nonlinear; The improved grey wolf optimizer also adjusted a nonlinear parameter of GWO based on polynomials [41], and showed accurate measurement results in the optimization of seepage parameters; However, these nonlinear strategies have only succeeded in improving the performance of GWO in some aspects. For example, improved GWO [42] was beneficial to improve the convergence performance of unimodal functions, but has a poor effect on multimodal functions. Besides parameter update equations, fuzzy method [43] was used for the adaptive adjustment of the control parameters. The exploration-enhanced grey wolf optimizer (IEE-GWO) [44] used a nonlinear control parameter strategy, which has been proven that IEE-GWO has a fast convergence rate when solving unimodal functions. There are many excellent parameter adjustment strategies to improve GWO, but this method makes the algorithm perform well only on specific problems.
Some improved GWO introduced the location update strategy, making GWO suitable for a variety of optimization problems. A new search strategy named dimension learning-based hunting (DLH) [45] was introduced in IGWO, which inherited from the individual hunting behavior of wolves and shared neighboring information; An improved GWO variant used two strategies, neighbor gaze cue learning (NGCL) and random gaze cue learning [46]. These two strategies can update the location of wolves and achieve a balance between exploration and exploitation; Besides, multi-stage grey wolf optimizer (MGWO) [47] can update wolves at three stages and maintain convergence speed.
In fact, some other variants hybridize GWO with other search strategies or metaheuristic algorithms to improve its performance. Then a hybrid of genetic algorithm (GA) and GWO were combined to reduce the dimension of the obtained feature vector [48]. In another similar work, a novel improved GWO called collaboration-based hybrid GWO-SCA optimizer was developed [49]. Experimental results indicated that it was a high-performing algorithm in global optimization. With the same goal, a recently developed metaheuristic optimization algorithm called hybrid PSO-GWO [50] has been proposed to improve exploitation and exploration ability.
3. Grey wolf optimizer
Grey Wolf Optimizer [20] is a swarm intelligence optimization algorithm. Compared to other optimization algorithms based on population, GWO has significant differences in hunting mechanisms and mathematical models. In hunting mechanisms, GWO simulates uniquely the predation behavior according to the hierarchy of nature. The grey wolves are divided into four grades, including alpha (α), beta (β), delta (δ) and omega (ω). In groups, each of level grey wolves has a different responsibility. As a leader, α wolf has a powerful effect on the group and determines the hunting direction of the wolves; β wolf is in the second level of wolves, which helps α wolf in decision-making and dictates instructions to wolves in the lower hierarchy; δ wolf considered in third level of hierarchy, which can be following the arrangement in α and β; ω is at the bottom of the hierarchy. GWO hunting is abstracted as searching for optimal values. Specifically, it can be described as the following mathematical model.
3.1. Mathematical model for encircling the prey
The first process of hunting is encircling the prey. Eq (2) updates the position of grey wolf by calculating the distance between the grey wolf and the prey.
(1)
(2)
where Xp denotes the prey position, X(t) refers to a grey wolf position, X(t+1) represents the location of a grey wolf in the next iteration, D represents the distance between the grey wolf and its prey. C is the oscillation factor, A is the convergence factor. When |A|>1, wolves will conduct a large-scale search on the global scope. When |A|<1, wolves will conduct a fine search for local areas. It can be expressed by the following formula:
(3)
(4)
(5)
where r1,r2∈[0,1] is the random variable, a represents the distance control parameter that decreases linearly from 2 to 0, t is the current number of iterations and T is the maximum number of iterations.
3.2. Mathematical model for hunting mechanism
When the grey wolf tracks the prey’s position, α wolf will lead β wolf and δ wolf to surround the prey in nature. However, in a simulated search space we do not know the prey location. In order to build the hunting model, the optimal, sub-optimal, and third-optimal solutions are used as α, β and δ wolf positions. We suppose that three solutions guide other wolves to attack the prey. The position of the first three wolves will change.
(6)
(7)
where Xα, Xβ, Xδ represent the current position of α, β and δ. Dα, Dβ, Dδ represent the distance between the three wolves and the prey. X1, X2, X3 represent the updated position of α, β and δ wolf. A1, A2, A3 are defined in Eq (3), which represent respectively the convergence factor of α, β and δ. At this time, three wolves are the closest prey in the wolves. Therefore, individual positions are updated according to α, β and δ wolf position:
(8)
The wolves continuously search for the optimal solution according to the above process. After hunting, determine Xα is the location of the prey.
Compared to other population-based optimization algorithms, the grey wolf optimizer has some advantages. For example, the grey wolf optimizer has a simple structure with few parameters; Grey wolf optimizer can find the optimal results quickly due to its unique hierarchy; In addition, the low time complexity of the grey wolf optimizer allows it to play an important role in practical optimization problems. However, there are still some disadvantages. For example, the grey wolf optimizer is prone to fall into local extremes. Therefore, proposing an effective improved grey wolf optimizer is one of our research objectives.
4. Improved grey wolf optimizer (SGWO)
To improve the optimization performance of the GWO algorithm, we proposed SGWO based on the adaptive information interaction mechanism. The SGWO algorithm was described in terms of implementation method and algorithm steps.
4.1. Circle population initialization (cGWO)
In GWO, the optimal value was greatly constrained by the initial position. Compared with a random search, the map was widely applied to generate the initial population because of its randomness. However, different chaotic maps have different effects. To find the optimal value quickly, we analyzed and compared Sobol, Logistic, Iterative, and Circle maps [21,22] in Fig 1.
Fig 1 gave the distribution of 200 populations at 30 iterations, respectively. For better display, the problem dimension was set to 2 dimensions. The two axis labels represented the two dimensions respectively. The search interval of the variables was set to [0, 1]. According to Fig 1, Four mapping distributions are all uniform, and the grey wolf group using Tent mapping is more evenly distributed in space than other maps. However, some individuals on the map are at the boundary, which will affect the overall efficiency of the algorithm. Compared with the four mappings, the circle map has more boundary individuals. To enhance the algorithm to deal with extreme value problems and consider experimental results, the paper still chose a circle map finally. The circle map [51] model is as follows:
(9)
where Xt represents the population individuals at the t-th iteration. The circle map is used only once in the initialization step to generate an initial population [20]. In the iteration, GWO only uses this initial population once for position updates. The circle map can balance population distribution and reverse inhibition. When the algorithm falls into a local extreme, a uniform population distribution can help wolves move to the next location. Therefore, population initialization plays a role in improving the exploration ability of SGWO.
4.2. Information interaction mechanism (iGWO)
In the information interaction mechanism, the hunting process was simulated as the information interaction process among wolves. Where, the hunting path as the channel, α position as the source point, β position as the transmission station, and subordinate wolves as the signal receiving point. Cauchy variation was used to change the position of source point. Golden Sine algorithm has optimized the information transmission process, and enhanced information exchange between wolves. Mathematically, the information interaction mechanism can be constructed in two steps. Every step can be explained as follows.
- (1) disturbing source point
In GWO, α wolf position belongs to the source point, which determines the attack direction for wolves. If the leader’s position deviates, it will prolong the search time and reduce the search accuracy. Thus, the Cauchy variant [52] with excellent local exploration ability was used to optimize the head wolf. The α wolf can jump out of the local extreme value, and avoid premature convergence. The standard Cauchy distribution function is as follows.
The standard Cauchy distribution function is delayed from a flat peak to both ends. A longer trailing tail can increase the perturbation probability and make the head wolf jump out of the local extremum quickly; a flat peak can reduce its search time in the adjacent area and enhance the ability to search for the global optimal solution. The standard Cauchy operator was used to randomly disturb the α wolf’s position. The position update formula for α wolf is as follows:
(11)
X1 is defined in Eq (7), which represents the final positions under the leadership of α in GWO. And X1 is calculated according to Xα. In Eq (11), X1 is used as the initial position of α. is the new location of α wolf, which represents final position of α in SGWO. A Cauchy variant is helpful for α wolf to pass the best hunting position to wolves. Wolves can quickly close to the prey, to speed up the search speed.
- (2) optimize information transmission process
In GWO, β wolf location belongs to the transmission station in the communication channel. However, the suboptimal value cannot determine the distance from β wolf to α wolf and δ wolf. Therefore, the information will be biased when β wolf transmits α wolf position to the subordinate wolves. When the algorithm is solving highly complex optimization problems, it is difficult to fully explore the solution space, which affects the search accuracy.
The golden sine algorithm (Golden-SA) [53] is a new meta-heuristic optimization algorithm. All points on the sine function are scanned by the unit circle and solution space is fully traversed. Thus, the optimal solution will be searched in Golden-SA. Updating the solution process is the core of the Golden Sine algorithm.
(12)
where
refers to a current individual position.
refers to a current optimal position. R1 is [0,2π] random variable and R2 is [0,π] random variable. They control the distance and direction of movement respectively. The golden ratio τ is
. x1 and x2 is obtained by τ, these two coefficients narrowing the space by spiral search and keep approaching towards the optimal solution.
Inspired by the golden section, the golden sine algorithm was incorporated into the GWO algorithm to change the movement of β wolf. The position update formula of the β wolf is as follows:
(15)
where
represents the new distance between the β wolf and the prey.
represents the new position for β wolf in SGWO. X(t) is defined in Eq (1), which represents to a grey wolf position. R1 and R2 are defined in Eq (12), which represent random variables in [0,2π] and [0,π]. Eq (15) is updated based on Eq (7), which A2 still represents the convergence factor of β.
An analysis based on Fig 2 and Eq (15) shows that: R1, R2 can constantly adjust the moving direction and moving distance of β wolf, so that β can fully understand the information difference between α and δ wolf. More specifically, β wolf is ensured at the golden division between α and δ wolf (as in Fig 2A). This method enhances information exchange in GWO. In addition, SGWO can scan all points on the unit circle and continuously enclose the wolves into the sine function (as in Fig 2B). Thus, wolves gradually approach the prey position (the global optimal solution), improving search speed and efficiency.
This paper transplanted the Cauchy mutation and Golden Sine algorithm as the information interaction mechanism between wolves into GWO algorithm, which can promote the information exchange between α, β and superior and subordinate wolves. The α, β wolf can release the decision results to subordinate wolves in the best transmission position. The improved SGWO can improve the shortcomings of the traditional GWO algorithm, and guide wolves to accelerate their approach to prey.
4.3. Adaptive position update (aGWO)
The individual position update is a key process in hunting. However, GWO always refers to the three wolf locations, making it difficult to balance global and local exploitation capability. GWO always maintains a constant update mechanism. We were inspired by the decay of the learning rate in machine learning [54], and adaptive weight ω was introduced at the location update. We define the ω in Eq (17). The updated position formula is as follows.
(16)
(17)
where a is the distance control parameter and is defined in Eq (5). X3 represents the position of δ in GWO.
represents the updated position of α by Cauchy distribution, and
represents the updated position of β by golden sine algorithm. X(t+1)’ represents the next iteration position for a wolf, which is also the final position for a wolf in SGWO.
Due to the traditional inertia weights being artificially set, they cannot conform to the wolves hunting process. The adaptive weight factor proposed incorporated the distance control parameter so that the algorithm will adjust the search range autonomously in different periods. In the early stage of iteration, the algorithm searched the solution space globally with a large step, and in the later stage of iteration, the algorithm searched the region finely. Setting p to 0.25 was to avoid losing the optimal solution and reducing the accuracy of the algorithm.
Form Fig 3, in the early iteration, ω is large for jumping out of the local extremes; in the late iteration, ω is smaller for improving the local search capability. Integrating adaptive weight into traditional GWO can balance global exploitation ability and local exploration ability, and find the global optimal solution quickly.
The adaptive location update mechanism is suitable for other optimization algorithms based on population, such as whale optimization algorithm (WOA) and white shark optimizer (WSO), etc. In these algorithms, this mechanism is applied to improve the formula for location update. In practice, this mechanism automatically adjusts the search step of populations by changing the parameter values, which ensures that the algorithm has global exploitation ability and local exploration ability.
4.4. Complexity and convergence analysis of SGWO
4.4.1. Complexity analysis.
The time complexity of the comparative experimental algorithm was as follows:
From pseudo-code, all improved strategies are included in GWO cycle optimization process. Thus, SGWO and GWO have the same time complexity. O(SGWO) = T×n×Dim. Where, T is the maximum number of iterations, n is the number of populations, and Dim is the dimension. SGWO has few parameters, the final order is: GWO ≈ SGWO ≈ WOA ≈ SCA ≈ SSA ≈ ASO < MFO < MVO.
4.4.2. Exploitation and exploration analysis.
In the exploitation phase, GWO completed the hunting task by reducing the value of a. a was decreased from 2 to 0 over the course of iterations. When |A|>1, the wolves deviated from its prey; When |A|<1, the wolves attacked their prey. However, this approach led to longer exploitation times and the inability to accurately locate prey. In SGWO, we introduce an information interaction mechanism, where β wolf can accurately convey the position of α wolf to its subordinate wolves at the golden section. The wolves can quickly approach their prey through the information interaction mechanism. It is worth mentioning here that the golden sine algorithm can scan all points on a unit circle and continuously surround wolves into a sine function. Therefore, the information interaction mechanism can shorten the exploitation time of wolves. At the same time, we introduce an adaptive weight ω into SGWO, which can adjust the search range independently at different stages. As the p increases, ω will decrease rapidly, allowing SGWO to globally search in the solution space in larger steps. Therefore, both the information interaction mechanism and adaptive weight can improve the exploitation ability and ensure that the algorithm quickly converges to the optimal value.
In the exploration phase, GWO is prone to stagnation in local solutions. We introduce the Circle mapping and Cauchy distribution function to solve this problem. In the initial stage, circle mapping can increase population diversity, which facilitates individuals caught in extremes to find neighbors quickly. When the algorithm stalls, α wolf will change position by the Cauchy mutation. α wolf will once again lead the pack out of the stagnant region. In addition, adaptive weight ω also takes effect during the exploration phase. At the end of the iteration, the amplitude of ω decreases as the value of λ increases. The algorithm will search more accurately within this interval. Therefore, the adaptive weight can effectively balance the exploration and exploitation stages. SGWO also emphasizes exploitation and exploration, so as to improve the convergence speed of GWO and efficiency.
4.4.3. Convergence analysis with Markov process and probability 1.
Previous research has indicated that the performance of metaheuristic algorithms was improved. To date, no broad study has been performed on the theoretical analysis of metaheuristic algorithms. In this case, we have introduced innovatively Markov process and probability analysis to prove convergence performance of SGWO.
- (1) convergence analysis with Markov process
Definition 1. Set X = {X|X∈Y} be gray wolf state space, which x1,……,xi∈X. Y refers to solution space and xi refers to with wolf space. Set be wolve state. Set
, which Xi refers to with wolve state set. Set
be wolves state space, which is constituted by wolve state.
Theorem 1. In the SGWO algorithm, Let the wolves state sequence be a finite homogeneous Markov chain and the corresponding Markov process with absorbing states.
Proof. (1) finite homogeneous Markov chain
Considering wolf’s state shift probabilities in the reference [55], it is known that is determined by l wolf state shift probabilities. State shift probabilities are
. According to Eq (15),
is related only to the state X(t−1) at the previous moment. The vector coefficients are Ci. The Dα, Dβ and Dδ between the first three wolves and their prey. Thus, according to the definition of the Markov chain, {φ(t):t>0} has Markov property.
Due to search space for any optimization being finite, each xi is finite. State space X is also finite. Because φ is composed of Nφ and X is a countable set, φ is finite. Similarly, the wolves’ state-space set ϕ is also finite. Therefore, {φ(t):t>0} is a finite Markov chain.
According to Eq (16), it is clear that X(t) is only related to the state X(t−1) at the previous moment, not the number of iterations. Thus, {φ(t):t>0} is a finite homogeneous Markov chain.
Proof. (2) Markov process with absorbing states
During each iteration, the algorithm records the current optimal top three wolf positions, so SGWO still uses an elite retention strategy. Thus, the corresponding Markov process with absorbing states.
- (2) convergence analysis with probability 1
Theorem 2. SGWO algorithm is global convergence with probability 1.
Proof. To prove Theorem 2, we need to divide it into two steps. The first step is to prove that SGWO is global convergent, and then prove that the probability of convergence is 1. From the literature [56], it is clear that the conventional GWO algorithm is convergent, so that X(t+1)→Xg(t) when t→∞. To prove the convergence of the SGWO algorithm, it is only necessary to prove that X(t+1)’→Xg(t)’ when t→∞. That is, when t→∞.
From Eq (5), a→0 when t→∞.
That is, in Eq (17), ω→0 when t→∞.
Thus, when t→∞.
Therefore, SGWO is convergent.
Then SGWO satisfies the necessary and sufficient condition of global convergence in reference [55].
Thus, SGWO is the globally convergent algorithm.
Assume that at one time t, X(t) enters the global optimal state solution set G. Then at time t−1, X(t−1) must fall into G. That is , then
Let and
, where v(t) is the probability measure, then:
Due to ,
. Thus,
. We finally prove that the SGWO algorithm is a globally convergent algorithm with a probability of 1.
5. SGWO-Elman model construction
5.1. Elman neural network
Elman neural network is divided into four layers: input layer, hidden layer, undertake layer, and output layer [1]. The connection of input layer, hidden layer and output layer is similar to a feedforward network. The input layer units only serve as signal transmission, while the output layer units serve as weighting. There are two types of excitation functions for hidden layer elements: linear and nonlinear. Generally, the excitation function is taken as the Sigmoid nonlinear function [2]. The receiving layer is used to remember the output value of the hidden layer unit at the previous moment, which can be considered as a delay operator with one step delay. The output of the hidden layer is used to the input of the hidden layer through the delay and storage of the undertake layer [3]. This connection method makes it sensitive to historical data. The internal feedback network improves the ability of processing dynamic information, thereby achieving dynamic modeling. The structural Elman is shown in Fig 4.
The Elman model can be described as Eq (18). Where, y is the node vector of the output layer; x is the node vector of the middle layer; u is the input vector; xc is the feedback state vector; ω1 is connection weight from hidden layer to undertake layer. ω2 is connection weight from input layer to hidden layer. ω3 is connection weight from hidden layer to output layer. b1 and b2 are the thresholds for the input layer and the hidden layer.
5.2. SGWO-Elman model
When Elman performs the prediction task, it first randomly selects the initial values of the parameters, then continuously updates the sample space through network training, and finally determines the best combination of parameters that fits the characteristics of the sample set. Due to the blind selection of initial parameters during the training process, the prediction effect of the network predictor is reduced and the training process is prone to fall into local extremes. Therefore, it is necessary to find the best parameters at the initial time. to train a better network structure. The optimal network parameters can better train the network structure in the iterative process. This can not only enhance the adaptability of the predictor to the dataset, but also improve the prediction accuracy. We introduced SGWO into the parameter optimization of the Elman neural network and proposed a new Elman prediction model (SGWO-Elman). This is another novel point about this paper.
The principle of the SGWO-Elman model was to replace the Elman network training problem with the weight optimization problem. Set the neural network structure is Net{ω1,ω2,ω3,b1,b2}. Set X∈[x1,x2,…..,xn] and are input and output prediction sample space. Set Y∈[y1,y2,…..,ym] is the sample space to be measured. Then the search optimization objective of this paper is as follows:
(19)
This paper takes the parameter combination of the Elman neural network as training goal, the initial predictor was generated after Eq (19). The predictor was used as the gray wolf individual, to obtain the initial population. Then, the minimum mean square error (MSE) was used as the fitness function:
(20)
SGWO continuously trained the network structure through iteration. Until the optimal parameter combination was determined. Finally, the optimal network predictor can be obtained. Elman and SGWO-Elman optimization process in space is depicted in Fig 5.
In Fig 5A, Elman uses the single point search method to find the optimization route by the gradient descent, which is easy to fall into the local extremum. In Fig 5B, SGWO-Elman completes neural evolution by using the optimization algorithm, which realizes multi-point search in space. Compared with single point optimization, SGWO can find the global optimal solution. As a result of the optimization algorithm training, the network search and parameter calculation abilities have improved.
The SGWO-Elman prediction model is specified as pseudo code and Fig 6.
SGWO-Elman Prediction Algorithm
input: datasets, network parameters, SGWO parameters
output: prediction results
1: Building the Elman network
2: for i = 1 to epochs do
3: Training network
4: end
5: Get initial Net{ω1,ω2,ω3,b1,b2}
6: Initialization of the gray wolf population using a circle map
7: while (t<tmax) do
8: for i = 1 to N do
9: for j = 1 to dim do
10: Calculate parameters A and C using Eqs (3) and (4)
11: Calculate α and β locations using Eqs (11) and (15)
12: Update individual position using Eq (16)
13: end for
14: end for
15:Calculate individual fitness values using Eq (20)
16: Update Xα,Xβ,Xδ locations
17: end for
18: Get the optimal Net{ω1,ω2,ω3,b1,b2}
19: SGWO-Elman prediction
20: Get prediction results
6. Results and discussion
6.1. SGWO comparative experiment
6.1.1. Experimental information.
- (1) comparison methods
To ensure the experimental objective fairness, SGWO was compared with SCA, MFO, WOA, GWO, the latest variant of GWO (mGWO) [57], white shark optimizer (WSO) [34] and covariance matrix adaptation evolution strategy (CMAES) [27]. The performance of every algorithm was investigated on the 8 benchmark functions. Table 1 shows the details of 8 benchmark functions.
- (2) evaluation criteria
The initialization parameter of all algorithms was same, where the population size is 50 and maximum the number of iterations is 1000. For performance testing, 30 runs have been performed in 50 dim, 100 dim and 500 dim, respectively. And experimental results were presented in terms of:
- Best of 30 runs
- Worst of 30 runs
- Mean of 30 runs
- Standard deviation of 30 runs
- Non-parametrical statistical tests
- Wilcoxon test and ranking.
6.1.2. Exploitation analysis.
Tables 2–4 show the results for SCA, MFO, WOA, GWO, mGWO [57], WSO [34], CMAES and SGWO in 50 dim, 100 dim, and 500 dim, respectively. From Tables 2–4, there are several conclusions can be obtained.
- It may be noted that the unimodal functions are suitable for testing the exploitation performance of algorithms. In unimodal functions (f 1—f 4), all results of SGWO reach the theoretical optimum in all dimensions. At the same time, the optimization results of SGWO in three dimensions are higher than other comparison algorithms. This indicates that the SGWO has better global exploitation capability in the unimodal function. Therefore, SGWO has better stability than the other 7 algorithms.
- From comparison algorithms, the accuracy of SGWO, GWO, mGWO and WOA is higher than other algorithms. In particular, the results of MFO in single-peak function obviously deviate from the theoretical optimal value. Although WOA has a good comprehensive effect on most functions, the optimization effect of f 3 also deviates. Although mGWO and SGWO have similar search results on some functions, there is still a gap in the f 3 and f 4 functions. In addition, advanced WSO performs poorly on 7 functions in different dimensions. Compared to advanced metaheuristic algorithms, SGWO still outperforms WSO in all functions and dimensions. Therefore, SGWO has more advantages in exploitation ability.
- SGWO achieves the best results in all experiments in different dimensions. Compared to other algorithms, their results show a significant decrease with the increase in dimensionality. However, SGWO is not susceptible to increased dimensions. In 500 dim, SGWO still converges to the theoretical optimal value on f 1—f 4, f 6 and f8. On f 5, the results of SGWO at 500 dimensions are better than those at 50 and 100 dimensions. On f 7, the results of SGWO are the same in the three dimensions. That proves that SGWO not only has prominent advantages in low dimensions, but also exhibits the best experimental results in 500 dimensions. Thus, SGWO is more suitable for solving high-dimensional problems and has a high dimensional extension.
To better compare the convergence speed of different algorithms, the convergence curves of two unimodal functions (f 1, f 2) and two multimodal functions (f 6, f 8) were analyzed in Fig 7.
- From unimodal functions, SGWO needs 300 iterations in f 1 function to converge to the theoretical optimal value, and 500 iterations in f 2 function. GWO has not reached the theoretical optimal value after 1000 iterations. The optimization results of other algorithms, including mGWO and WSO, did not change significantly. This further demonstrates that SGWO can improve global exploitation capabilities.
- With the increase of dimension, other algorithms change obviously, while SGWO ensures better convergence speed and accuracy. Thus, SGWO has significant advantages in global exploitation ability.
6.1.3. Exploration analysis.
Compared to unimodal functions, multimodal functions have many local optimizations, which makes them more suitable for testing the exploration capabilities of algorithms.
- For f 6 and f 8, SGWO still converges to the optimal value in different dimensions. For other algorithms such as WSO and SCA, they perform poorly on these two functions. With dimensions increasing, the results of other algorithms gradually decrease. Thus, SGWO is able to provide very competitive results on f 6 and f 8. This indicates that SGWO has a strong advantage in jumping out of the local extreme value and SGWO has better local exploration capability.
- For f 7, neither GWO nor SGWO undergoes significant progress than other algorithms. It shows that most meta heuristic algorithms are not applicable to the optimization on f 7. SGWO experimental results are still slightly higher than other algorithms on f 7. Therefore, the own defects of GWO limit the effect of SGWO. This indicates that SGWO still exhibits excellent performance than other algorithms.
- For f 5, the results of all algorithms are not significantly different under the same dimension. However, SGWO still has advantages. This indicates that SGWO still exhibits excellent performance in complex functions. With dimensions increase, SGWO has the best results on 500 dimensions. This indicates that SGWO still has advantages in dealing with high-dimensional problems.
- From Fig 7, the SGWO algorithm has a faster search speed in the same dimension compared to state-of-the-art WSO and mGWO. SGWO curve has fewer turning points, while other algorithms fall into local extreme points many times. Because the SGWO algorithm incorporated a hybrid strategy optimization leadership mechanism, the head wolf was prevented from falling in the local extremum through random disturbance. Therefore, SGWO has local exploration capability.
- From multimodal functions, the convergence effect of SGWO, mGWO and WOA functions is obviously faster than other algorithms. At the end of iteration, the optimization results of other algorithms are not affected by the increase in iteration times. SGWO has an outstanding advantage over single-peaked functions, but the optimization performance and convergence speed still need to be improved.
6.1.4. Non-parametrical statistical tests.
A full statistical analysis of the optimizer comparison must be presented based on significant non-parametric tests. As the non-parametric test, Friedman test [58] was used to examine the overall performance of all algorithms. The null assumption in this test was that all algorithms would perform equally. The alternative hypothesis consists in the difference between more algorithms. We used Friedman test to analyze the results of Tables 2–4. Table 5 shows the results of the Friedman test.
From Table 5, the p-values for all 3 dimensions are smaller than 0.05. Therefore, the null hypothesis is rejected. This indicates that all algorithms are significantly different. In this case, we will use the "Nemenyi post-hoc test" [59] for adjusting the results for pairwise comparisons. The Nemenyi test requires to calculate the critical value.
(21)
where, k represents the number of algorithms. N represents the number of functions. After calculation, CD = 4.0429. To calculate the statistic, we rank the algorithm performance for each problem and compute the mean of each algorithm. Table 6 shows the results of mean ranks. Table 7 shows the mean ranks difference between each algorithm and SGWO.
If the difference between the mean ranks exceeds CD, the hypothesis that the two algorithms have the same performance is rejected. From Table 7, SCA, MFO and CMAES are higher than CD at all dimensions. This indicates that SCA, MFO and CMAES all differed significantly from SGWO. There is no significant difference between other algorithms and SGWO.
6.1.5. Wilcoxon test and ranking.
The Friedman + Nemenyi test can express the overall performance and individual differences of SGWO. However, it is still necessary to evaluate the comparative results of each algorithm on different functions. We used the Wilcoxon rank sum test [60]. Table 8 shows the p values of SGWO and other algorithms, which are at p = 0.05 significance level.
It can be seen from Table 8 that SGWO is more statistically significant than all other algorithms except mGWO and WOA. In 50 and 100 dim, the results between mGWO and WOA are not applicable on f 6 and f 8. These prove that their significance with SGWO is lower. In 500 dim, the results of WOA and GWO on f 6 and f 8 are higher than 50 and 100 dimensions, indicating that there is a significant difference between SGWO and these two algorithms. Therefore, the SGWO algorithm is not affected by dimensions and can be extended to high dimensions. SCA, MFO, WSO and CMAES have the same result on different functions. This indicates SGWO is significantly different from SCA, MFO, WSO. However, CMAES differs less from SGWO in the overall comparison Table 7. All the above analyses are consistent with the results in Table 7. Meanwhile, all results are the same on f 1—f 4, but the result of f 5 is higher than other functions. This shows that each algorithm has a lower optimization effect on f 5. With the increase of dimensions, the results have little difference in different dimensions.
In conclusion, SGWO is superior to other comparison algorithms. SGWO has significantly better optimization performance and comprehensive strength. In addition, we used MAE to sort the all algorithms [61]. MAE expression is as follows:
(22)
where, Meani is the mean value of the algorithm. oi is the theoretical optimal value of the benchmark function. Nf is the number of benchmark functions. Table 9 shows MAE under different dimensions. Table 10 shows the sum of MAE in each algorithm. MAE = (MAE50+MAE100+MAE500)/3.
From Table 9, all algorithms rank differently in each dim. With the increase of dim, the MAE values change significantly for other algorithms, but SGWO can maintain the optimal level. This indicates that SGWO has strong stability and is not easily affected by dim changes. In 50 and 100 dimensions, the ranking of the eight algorithms is the same, SGWO > mGWO > GWO > CMAES > WSO > SCA > MFO >WOA respectively. CMAES ranks higher than GWO and mGWO in the 500 dimensions, which shows that the performance effect of each algorithm is different in the three dimensions. GWO is lower than SGWO in every three dimensions. This proves that the comprehensive performance of GWO is better than other comparison algorithms, and the improved strategy proposed in this paper significantly improves the optimization effect of GWO. The sorting results in Table 10 are SGWO > CMAES > mGWO > GWO > WSO > SCA > MFO > WOA, respectively.
Fig 8 shows the box convergence diagram of two benchmark functions in different dimensions. The two benchmark functions are a unimodal function f 4 and multimodal functions f 6 respectively.
From Fig 8, The fitness values of SGWO are lower than other comparison algorithms, even close to zero. It shows that the improved strategy based on an adaptive information interaction mechanism is effective for traditional GWO. The median of SGWO is lower than other algorithms whether in different dim or peaks. This shows that SGWO can get a better optimization effect after multiple iterations. At the same time, the interquartile spacing of SGWO is short than other algorithms, which indicates that the optimization effect of SGWO is more concentrated under each function and dimension.
6.2. Three strategies comparative experiment
6.2.1. Experimental information.
In order to analyze the impact of different strategies on the SGWO algorithm, we conducted comparative experiments on four algorithms. cGWO is the first strategy “Circle population initialization”; iGWO is the second strategy “Information interaction mechanism”; aGWO is the third strategy “Adaptive position update”; aWOA is the application of the third strategy to WOA.
To ensure the experimental objective fairness, The initialization parameter of all algorithms was same, where the population size is 50 and maximum the number of iterations is 1000. For performance testing, 30 runs have been performed in 50 dim, 100 dim and 500 dim, respectively. And experimental results are presented in terms of:
- Best of 30 runs
- Standard deviation of 30 runs.
6.2.2. cGWO analysis.
Table 11 shows the results for GWO, cGWO, iGWO, aGWO, aWOA and SGWO in 50 dim, 100 dim, and 500 dim, respectively. Fig 9 shows the convergence curves of different algorithms in unimodal functions f 2 and multimodal functions f 7.
From Table 11, compared to GWO, the results of cGWO have improved slightly in all functions of different dimensions. This indicates that the circle population initialization strategy can improve the optimization ability of GWO. However, the improvement effect of cGWO is weaker than iGWO, aGWO and SGWO. Specifically, circle population initialization was used only once in the initialization step, which weakened the effect of cGWO.
From Fig 9, in unimodal functions f 2, although the convergence speed of cGWO algorithm is slightly higher than GWO, it is still not as good as other strategies. In multimodal functions f 7, the convergence speed of cGWO is better than GWO and iGWO. At the same time, the number of transitions in cGWO should be less than aGWO, aWOA and SGWO. This indicates that circle population initialization can help cGWO jump out of local extremum. Therefore, cGWO can not only improve the exploration ability of GWO, but also contribute to improving SGWO.
6.2.3. iGWO analysis.
From Table 11, compared to GWO, the results of iGWO have improved significantly in all functions of different dimensions. In unimodal functions f 1—f 4, iGWO can be improved dozens of times. In f 6 and f 8, iGWO can reach the theoretical optimal value. This indicates that the information interaction mechanism can improve the convergence ability.
From Fig 9, in unimodal functions f 2, the convergence speed of iGWO is higher than GWO and cGWO. In multimodal functions f 7, the convergence speed of iGWO is lower than GWO, cGWO and aWOA at the beginning of the iteration. However, the convergence speed of iGWO is higher than GWO, cGWO and aWOA at the end of the iteration. Therefore, the information interaction mechanism will contribute to generally the efficiency of SGWO.
6.2.4. aGWO analysis.
From Table 11, aGWO can reach the theoretical optimal value in f 1—f 4, f 6 and f 8. The results of aGWO are not significantly different from SGWO. This indicates that adaptive position update strategy can improve optimization performance of GWO and play an important role in SGWO. Meanwhile, information interaction mechanism is the best strategy compared to the other two strategies. From Fig 9, the convergence speed of aGWO is the same as SGWO. And they can quickly converge to the optimal value.
On the meanwhile, we incorporate adaptive position update strategy into WOA. Although the convergence performance of aWOA is not as good as that of aGWO, it is still superior to GWO. From Table 10, it can be seen that GWO performs better than WOA. Therefore, aGWO > aWOA > GWO > WOA. The information interaction mechanism can also improve the optimization performance of WOA. This further proves that the information interaction mechanism is an effective strategy.
6.3. Sensitivity analysis of parameters
The sensitivity analysis of two control parameters of Eq (17) is investigated in this section. These two parameters are λ and p, which together control the change of ω in the iteration. On the meanwhile, ω plays an important role in balancing exploration and exploitation. Therefore, it is necessary to conduct sensitivity analysis on λ and p.
Table 12 represents ω mean by 1000 iterations under various parameter combinations. As shown in Table 12, when p is constant, the mean value of ω gradually decreases as λ increases. When λ is constant, as p increases, the mean value of ω gradually decreases and decays faster. In the seventh experiment, when λ and p reached the maximum, the mean value of ω was the minimum. The results can be interpreted as saying that λ and p are negatively correlated with ω.
Fig 10 represents ω curves of 1000 iterations under various parameter combinations. When λ is constant, with the increase of p, the value of ω decreases rapidly in the early stage of the iteration. That proves that p can exploitation time and quickly find the optimal value range for SGWO. At the end of the iteration, as the value of λ increases, ω will quickly transition to the exploration phase. With the increase of the iterations, the amplitude of the ω is decreased, which proves that SGWO will refine the search solution. Therefore, we set λ to the maximum to improve the SGWO’s exploration performance. Although increasing the p will accelerate the decrease in ω, considering that SGWO needs to balance exploration and exploitation, we set p to 0.25.
6.4. SGWO for practical applications
6.4.1. SGWO for tension/compression spring design problem.
The objective of this problem is to minimize the weight of a tension/compression spring [62]. This problem can be abstracted into the following mathematical model. In the model, x1 is wire diameter, x2 is mean coil diameter, and x3 is the number of active coils.
Table 13 shows the comparison of results of the tension/compression spring design problem. Table 13 suggests that SGWO finds a design with the minimum weight for this problem. This further proves that SGWO can be applied to practical problems and exhibits better performance.
6.4.2. SGWO for a large-scale optimization problem.
To prove the scalability of SGWO in large-scale optimization problems [63], we conducted a comparative experiment under 1000 dimensions. The experimental information is the same as that in section 6.1.1. Table 14 shows the results of 8 algorithms in f 1, f 2, f 6 and f 8.
From Table 14, SGWO can still find theoretical optimal values in large-scale optimization problems. Compared to other algorithms, SCA and MFO failed on f 2 and the results of WSO are also very poor on four functions. Therefore, SGWO is suitable for solving large-scale optimization problems and has strong stability.
6.5. SGWO-Elman comparative experiment
6.5.1. Experimental information.
- (1) datasets information
To verify the performance of SGWO-Elman, we selected six benchmark datasets from the UCI (http://archive.ics.uci.edu/ml) database and did two groups of experiments. Because there are a few null values and characteristic indexes irrelevant to the study, the collected datasets were preprocessed. The processed data information was shown in Table 15. To eliminate the problem of dimensional inconsistency, normalization was carried out before the data was input into the prediction model. Table 16 shows the number of hidden layers for different datasets.
- (2) evaluation criteria
For performance testing, 10 runs have been performed in three comparative experiments. And experimental results are evaluated in terms of:
- Best of 10 MSEs of runs
- Worst of 10 MSEs of runs
- Mean of 10 MSEs of runs
- Standard deviation of 10 MSEs of runs
MSE is the minimum mean square error. MSE can evaluate the predictive performance of neural networks by comparing prediction errors. MSE metric was defined in Eq (20). The comparison methods of three experiments are as follows.
- (3) comparison methods
For the first comparative experiment: we selected the SCA, MFO, sparrow search optimization algorithm (SSA) [35] and atom search optimization algorithm (ASO) [31] algorithms. They were fused into the Elman neural network to form SSA-Elman, MFO-Elman, ASO-Elman and SCA-Elman. These four optimization algorithms will be compared with SGWO-Elman. The parameters of all optimization algorithms were set to the same value.
For the second comparative experiment: we selected the traditional Elman neural network, standard back propagation neural network (BP), radial basis function neural network (RBF) [70], and generalized regression neural network (GRNN) [71]. The prediction effect of SGWO-Elman was determined by Elman. These four neural networks will be compared with SGWO-Elman.
For the third comparative experiment: we selected long short-term memory neural network (LSTM) [72] and RBF. They were fused into SGWO form SGWO-LSTM and SGWO-RBF. These two neural networks will be compared with SGWO-Elman. The parameters of all neural networks were set to the same value.
6.5.2. Comparison experiments based on optimization strategy.
Under the influence of SGWO performance, SGWO-Elman has better parameter optimization ability. To fairly analyze the optimization effect of SGWO on neural networks, Table 17 shows the comparison results of SGWO-Elman, SSA-Elman, MFO-Elman, ASO-Elman and SCA-Elman on six datasets. In Table 17, MSE metric can evaluate the predictive performance of neural networks by comparing prediction errors. MSE metric is the minimum mean square error, which was defined in Eq (20). Table 18 shows the prediction rankings of each algorithm on six datasets.
- (1) prediction performance analysis
From Table 17, all results of SGWO-Elman are optimal except the std of D4 and D6, and significantly lower than other algorithms. This indicates that SGWO can reduce the Elman’s prediction error and improve the Elman prediction accuracy. Compared with other evolutionary strategies, SGWO algorithm based on an adaptive information interaction mechanism is an effective parameter optimization method. On D1, D5 and D6 datasets, SSA-Elman, MFO-Elman, ASO-Elman, and SCA-Elman have large errors. Through data analysis, it can be seen that D1 has a large amount of data and many data features, and the data features of D5 and D6 have weak correlations. Therefore, it is more complex to predict the three kinds of datasets. However, SGWO-Elman has a lower error on these three datasets, which indicates that SGWO-Elman is suitable for weakly correlated datasets and can show better prediction ability, stronger stability and higher robustness than other algorithms.
From Table 18, SGWO-Elman always ranks first on all datasets in prediction performance. SGWO-Elman > SCA-Elman > MFO-Elman > ASO-Elman > SSA-Elman. Therefore, for the prediction problem, SGWO has accurate prediction performance. And for the parameter optimization problem, SGWO has a better optimization effect.
From Fig 11, the error of SGWO-Elman is lower than other algorithms on all datasets. On the D1, the errors of SGWO-Elman, SCA-Elman are close to zero, but ASO-Elman and MFO-Elman are very high, followed by SSA-Elman. On the D2—D4 datasets, the overall prediction error is low. Due to the limitations of D5 and D6, the MSE value of each algorithm is higher than other datasets. But the error distribution of SGWO-Elman is concentrated in D5. These show that SGWO-Elman has higher prediction performance and prediction accuracy and is suitable for most data. In practical engineering problems, using SGWO-Elman to predict can bring the greatest economic benefits to the project.
The result of the statistical analyses is presented on boxplots in Fig 12. From Fig 12, Compare with other algorithms SGWO-Elman has a lower median, and its lower quartile is close to the upper quartile in 6 kinds of datasets. There are almost no outliers in SGWO-Elman. Other algorithms have more outliers on D1, D2, D4 and D5. The results show that SGWO-Elman has higher prediction performance and stability than other algorithms. This fully verifies the good applicability of SGWO in Elman parameter optimization.
- (3) training time analysis
To verify the running speed of SGWO-Elman, we tested five algorithms on six datasets. Table 19 records the average training time of each algorithm in 10 tests, and Fig 13 displays the histogram of Table 19.
From Table 19 and Fig 13, it can be seen that ASO-Elman outperforms other algorithms in the average training on six datasets. SSA-Elman has the longest average training time. The average runtime of SGWO-Elman is not significantly different between SCA-Elman and MFO-Elman. Overall, the average training time of SGWO-Elman is at a medium level.
6.5.3. Comparison experiments based on neural network.
The prediction effect of SGWO-Elman was determined by Elman. To fairly analyze the prediction advantages of SGWO-Elman on various neural networks, it was compared with the traditional Elman neural network, standard BP neural network, the radial basis function neural network neural network (RBF) [70], and generalized regression neural network (GRNN) [71]. The experimental results are shown in Table 20.
Comparing the prediction results of Elman with BP, RBF, and GRNN in Table 20. On the mean, Elman only has advantages in D3 and D4, it is inferior to BP and RBF in D2 and D5 respectively, and has a large error in D1 and D6. This indicates that the overall prediction performance of Elman needs to be improved. Std results demonstrate that Elman reaches the lowest error in the dataset more frequently than other algorithms, proving that Elman has a better stability. Elman and BP have the lowest error on three datasets in the min respectively. Elman has the lowest error in only three datasets in the max. Those indicate that Elman is prone to a large bias in predicting a certain sample point, and the prediction effect and robustness of Elman still need to be improved. According to the overall analysis, the comprehensive performance of Elman is slightly better than BP, RBF, and GRNN.
Comparing the prediction results of SGWO-Elman with other algorithms in Table 20. In terms of mean and max, SGWO-Elman maintains the lowest error in all datasets and ranks first. Std results show that SGWO-Elman has the lowest error in four datasets, indicating that SGWO significantly improves the prediction ability and robustness of Elman. In min, SGWO Elman performs best in D2 and D6, it ranks first in D1, second in D3 and D5, and third in D4, and values of SGWO Elman in D5 are far better than other algorithms by several orders of magnitude. Those indicate that SGWO-Elman can also produce better prediction ability for less relevant data sets.
Comprehensive analysis shows that SGWO-Elman has higher accuracy than other algorithms in general, which obviously improves the stability and predictive ability of Elman, making Elman demonstrate stronger memory function in neural networks. The neural evolution method based on SGWO is effective, and the neural network based on SGWO-Elman has higher prediction accuracy. SGWO-Elman plays a greater role in solving practical engineering problems with high complexity, ensuring the minimum misjudgment rate as far as possible to reduce the economic loss of engineering production.
Figs 14 and 15 show MSE results and box diagrams respectively. From Fig 14, SGWO-Elman has a lower prediction error than other neural networks on all datasets. On D1, D5 and D6, although the error of other algorithms is very large, SGWO-Elman is still close to zero. This shows that neither evolutionary strategy nor the neural network applies to these datasets, but SGWO-Elman shows better prediction performance. On the D2—D4 datasets, SGWO-Elman still maintains better prediction accuracy. The overall analysis shows that the new neural network evolution strategy proposed in this paper can improve the shortcomings of traditional neural networks in parameter optimization. Elman based on SGWO is obviously superior to other neural networks and shows excellent prediction ability on most datasets. From Fig 15, the prediction errors of SGWO-Elman are lower than other neural networks, which indicates that the parameter optimization of the Elman neural network based on SGWO is effective. Compared with other algorithms, SGWO-Elman has no outliers on all datasets. In addition, SGWO-Elman centers the box graph on six datasets, which shows that parameters after multiple iterations can obtain a stable prediction effect relatively.
6.5.4. SGWO for other neural networks.
SGWO can extend to other types of neural networks, such as Long Short-Term Memory neural network (LSTM) and RBF. We incorporated SGWO into LSTM and RBF. The implementation steps for SGWO-LSTM and SGWO-RBF are shown in Fig 16. To verify the advantages of SGWO-Elman in prediction and optimization capabilities, we compared SGWO-Elman, SGWO-LSTM and SGWO-RBF.
Table 21 shows the experimental errors of the three algorithms on six datasets.
From Table 21, it can be seen that the prediction error mean of SGWO-Elman on the six datasets is lower than SGWO-LSTM and SGWO-RBF. SGWO-LSTM has better prediction performance than SGWO-RBF. This not only indicates that SGWO as an optimization algorithm can significantly improve Elman’s prediction performance, but also SGWO-Elman’s prediction performance is higher than other neural networks. Meanwhile, SGWO-Elman has the lowest std value on the four datasets, which proves that SGWO-Elman has predictive stability.
7. Conclusion
In this study, the improved grey wolf optimizer was proposed and applied to the parameter optimization of the Elman neural network as an evolutionary strategy. Through theoretical analysis and numerical experiments, the optimization-seeking performance and prediction performance of the model was explored, and the following conclusions were obtained:
- SGWO with an adaptive information interaction mechanism was proposed. This method used circle mapping to initialize the population, strengthened the information exchange among wolves in the channel through the Cauchy variant and the Golden-Sine algorithm, and updated the position of wolves with adaptive distance control weight.
- Theoretical analysis proved that the global convergence probability of SGWO was 1, and that the experimental process of SGWO was a finite homogeneous Markov chain with absorbing states. Numerical experiments with 8 benchmark functions showed that SGWO can effectively improve convergence accuracy and optimization efficiency than other 6 algorithms.
- The prediction performance of SGWO-Elman model was explored through comparative experiments. The results showed that SGWO-Elman model has good prediction accuracy, robustness and generalization performance. The index value in six datasets was better than the other evolutionary strategies and neural networks.
Although both the SGWO and SGWO-Elman proposed in this paper have better performance than the original algorithm, they still have some limitations. For example:
- SGWO has no significant effect in solving practical optimization problems.
- The training time of SGWO-Elman was higher than the other Elman based on optimization.
- Due to the structural characteristics of the metaheuristic algorithm, the optimized neural network will have a dimensional disaster in complexity, which makes SGWO-Elman challenging in big data prediction and image recognition.
To address the above issues, we will conduct further research in the future, as follows.
- In the future, we plan to improve the encircling mode of SGWO. We hope that this improved strategy is closer to the predatory behavior of wolves in nature. Meanwhile, we also plan to build a practical problem integrator, which will ensure that improved SGWO can be tested in integrator and improve the optimization ability in practical problems.
- In the future, we plan to reduce the time complexity of SGWO-Elman. Since SGWO-Elman is the fusion of SGWO and Elman, and is influenced by SGWO algorithm, its time complexity is much higher than that of neural networks. Therefore, follow-up research will try to simplify the optimization processes in SGWO to reduce time and spatial complexity of the SGWO, thereby reducing the training and testing time of the SGWO-Elman.
- In the future, we plan to build a preprocessing system based on SGWO-Elman. We hope that the system can extract early important features of big datasets and images. The system will reduce the complexity of the data entered into SGWO-Elman. Further, SGWO-Elman will be applied to predicate big data and recognize complex images by process system.
References
- 1. Elman JL. Finding structure in time. Cognitive Sci. 1990; 14: 179–211.
- 2. Ping X., Yang F., Zhang H., Zhang J. & Zhang W. Elman and back propagation neural networks based working fluid side energy level analysis of shell-and-tube evaporator in organic Rankine cycle (ORC) system. Alex Eng J. 2022; 61: 7339–7352.
- 3. Boualem S. et al. Power management strategy based on Elman neural network for grid-connected photovoltaic-wind-battery hybrid system. Comput Electr Eng. 2022; 99: 107823.
- 4. Aljarah I, Faris H, Mirjalili S. Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput. 2018; 22: 1–15.
- 5. Faris H, Aljarah I, Mirjalili STraining feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell. 2016; 45(2):322–332.
- 6.
Miller GF., Todd PM. & Hegde SU. Designing neural networks using genetic algorithms, the 3rd International Conf on Genetic Algorithms. George Mason University. 1989; VA: 379–384.
- 7. Rojas I, González J, Pomares H, et al. Statistical analysis of the main parameters involved in the design of a genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2002; 32(1): 31–37.
- 8. Liu W, Guo Z, Jiang F, et al. Improved WOA and its application in feature selection. Plos one. 2022; 17(5): e0267041. pmid:35588402
- 9. Abdel-Basset M., El-Shahat D., El-henawy I., de Albuquerque V. H. C. & Mirjalili S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Sys. Appl. 2020; 139: 112824.
- 10. Golnoori F, Boroujeni F Z, Monadjemi A. Metaheuristic algorithm based hyper-parameters optimization for skin lesion classification. Multimed. Tools. Appl. 2023; 1–33.
- 11. Zhang M, Yang J, Ma R, et al. Prediction of small-scale piles by considering lateral deflection based on Elman Neural Network—Improved Arithmetic Optimizer algorithm. ISA transactions. 2022; 127: 473–486. pmid:34507813
- 12. Jiang X, Duan H, Liao J, et al. Estimation of Soil Salinization by Machine Learning Algorithms in Different Arid Regions of Northwest China. Remote Sens. 2022; 14(2): 347.
- 13. Liu J, Yin Y. Power load forecasting considering climate factors based on IPSO-elman method in China. Energies. 2022; 15(3): 1236.
- 14. Liu B, Zhao Y, Wang W, et al. Compaction density evaluation model of sand-gravel dam based on Elman neural network with modified particle swarm optimization. Front. Phys. 2022; 9: 818.
- 15. Pan Y, Sun Y, Li Z, et al. Machine learning approaches to estimate suspension parameters for performance degradation assessment using accurate dynamic simulations. Reliab Eng Syst Safe. 2023; 230: 108950.
- 16. Zhang J, Xu L, Li J, et al. Parameter Acquisition Study of Mining-Induced Surface Subsidence Probability Integral Method Based on RF-AGA-ENN Model. Geofluids. 2022; 2022.
- 17. Sun Y, Zhang J, Yu Z, et al. WOA (Whale Optimization Algorithm) Optimizes Elman Neural Network Model to Predict Porosity Value in Well Logging Curve. Energies. 2022; 15(12): 4456.
- 18. Zhao L, Zhao X, Pan X, et al. Prediction of daily reference crop evapotranspiration in different Chinese climate zones: Combined application of key meteorological factors and Elman algorithm. J. Hydrol. 2022; 610: 127822.
- 19. Li L, Cheng S, Wen Z. Landslide prediction based on improved principal component analysis and mixed kernel function least squares support vector regression model. J Mt Sci. 2021; 18(8): 2130–2142.
- 20. Luo Y, Qin Q, Hu Z, et al. Path Planning for Unmanned Delivery Robots Based on EWB-GWO Algorithm. Sensors. 2023; 23(4): 1867. pmid:36850464
- 21. Bai S, Zhang S. Evaluation for Development Effect of Enterprise Innovation with Neural Network from Low-Carbon Economy. Wirel Commun Mob Comput. 2022; 2022.
- 22. Zhao H, Jin J, Shan B, et al. Pulsar identification method based on adaptive grey wolf optimization algorithm in X-ray pulsar-based navigations. Adv. Space Res. 2022; 69(2): 1220–1235.
- 23. Crawford B. et al. Binary fruit fly swarm algorithms for the set covering problem. Comput Mater Con. 2022; 71: 4295–4318.
- 24. Salgotra R, Singh U, Sharma S. On the improvement in grey wolf optimization. Neural Comput and Appl. 2020; 32: 3709–3748.
- 25. Prakash Tiwari S, Singh G. Optimizing Job Scheduling Problem Using Improved GA+ CS Algorithm. International Conference on Innovative Computing and Communications: Proceedings of ICICC. 2022; 1: 291–297.
- 26. Nadimi-Shahraki M H, Taghian S, Mirjalili S, et al. MTDE: An effective multi-trial vector-based differential evolution algorithm and its applications for engineering design problems. Appl. Soft Comput. 2020; 97: 106761.
- 27. Wischnewski K., Eickhoff S., and Jirsa V., et al. Towards an efficient validation of dynamical whole-brain models. Sci. Rep. 2022; 12 (1): 1–21.
- 28. Cintrano C, Ferrer J, López-Ibáñez M, et al. Hybridization of evolutionary operators with elitist iterated racing for the simulation optimization of traffic lights programs. Evol comput. 2022: 1–21.
- 29. Zhang X, Liu G, Zhao K, et al. Improved salp swarm algorithm based on gravitational search and multi-leader search strategies. AIMS Mathematics. 2023; 8(3): 5099–5123.
- 30. Mirjalili S., Mirjalili SM. & Hatamlou A. Multi-Verse Optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl. 2016; 27: 495–513.
- 31. Zhao W., Wang L. & Zhang Z. A novel atom search optimization for dispersion coefficient estimation in groundwater. Future Gener Comp Sy. 2019; 91: 601–610.
- 32. Nadimi-Shahraki M H, Taghian S, Mirjalili S, et al. Mtv-mfo: Multi-trial vector-based moth-flame optimization algorithm. Symmetry. 2021; 13(12): 2388.
- 33. Nadimi-Shahraki M H, Moeini E, Taghian S, et al. DMFO-CD: a discrete moth-flame optimization algorithm for community detection. Algorithms. 2021; 14(11): 314.
- 34. Braik M, Hammouri A, Atwan J, et al. White Shark Optimizer: A novel bio-inspired meta-heuristic algorithm for global optimization problems. Knowl Based Syst. 2022; 243: 108457.
- 35. Xue J. & Shen B. A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Control Eng. 2020; 8: 22–34.
- 36. Zhao H, Jin J, Shan B, et al. Pulsar identification method based on adaptive grey wolf optimization algorithm in X-ray pulsar-based navigations. Adv. Space Res. 2022; 69(2): 1220–1235.
- 37. Xu Z, Yang H, Li J, et al. Comparative study on single and multiple chaotic maps incorporated grey wolf optimization algorithms. IEEE Access. 2021; 9: 77416–77437.
- 38. Wang T, Li J, Liu R, et al. Dynamic Grey Wolf Optimization Algorithm Based on Quasi-Opposition Learning. 3D Imaging—Multidimensional Signal Processing and Deep Learning: 3D Images, Graphics and Information Technologies. 2022: 11–22
- 39. Tang M, Yi J, Wu H, et al. Fault Detection of Wind Turbine Electric Pitch System Based on IGWO-ERF. Sensors. 2021; 21(18): 6215. pmid:34577420
- 40. Song C., Wang X., Liu Z. & Chen H. Evaluation of axis straightness error of shaft and hole parts based on improved grey wolf optimization algorithm. Measurement. 2022; 188, 110396.
- 41. Meidani K., Hemmasian A., Mirjalili S. & Farimani A. B. Adaptive grey wolf optimizer. Neural Comput Appl. 2022; 34: 7711–7731.
- 42. Garg V. E2RGWO: Exploration Enhanced Robotic GWO for Cooperative Multiple Target Search for Robotic Swarms. Arab J Sci Eng. 2022; 1–17.
- 43. Qin H, Meng T, Cao Y. Fuzzy Strategy Grey Wolf Optimizer for Complex Multimodal Optimization Problems. Sensors. 2022; 22(17): 6420. pmid:36080878
- 44. Sun Z, Liu X, Ren L, et al. Improved Exploration-Enhanced Gray Wolf Optimizer for a Mechanical Model of Braided Bicomponent Ureteral Stents. Int J Pattern Recogn. 2022; 36(04): 2259010.
- 45. Nadimi-Shahraki M H, Taghian S, Mirjalili S, et al. GGWO: Gaze cues learning-based grey wolf optimizer and its applications for solving engineering problems. J Comput Sci. 2022; 61: 101636.
- 46. Nadimi-Shahraki M H, Taghian S, Mirjalili S. An improved grey wolf optimizer for solving engineering problems. Expert Syst. Appl. 2021; 166: 113917.
- 47. Yu H. et al. Image segmentation of Leaf Spot Diseases on Maize using multi-stage Cauchy-enabled grey wolf algorithm. Eng Appl Artif Intel. 2022; 109, 104653.
- 48. Pramanik R, Pramanik P, Sarkar R. Breast cancer detection in thermograms using a hybrid of GA and GWO based deep feature selection method. Expert Syst. Appl. 2023; 219: 119643.
- 49. Duan Y, Yu X. A collaboration-based hybrid GWO-SCA optimizer for engineering optimization problems. Expert Syst. Appl. 2023; 213: 119017.
- 50. Bhandari A S, Kumar A, Ram M. Grey wolf optimizer and hybrid PSO‐GWO for reliability optimization and redundancy allocation problem. Qual Reliab Eng Int. 2023; 39(3): 905–921.
- 51. Gupta S. & Deep K. An opposition-based chaotic Grey Wolf Optimizer for global optimisation tasks. J Exp Theor Artif Intell. 2019; 31: 751–779.
- 52. Komatsu T. Several continued fraction expansions of generalized cauchy numbers. Bulletin of the Malaysian Mathematical Sciences Society. 2021; 44: 2425–2446.
- 53. Tanyildizi E. & Demir G. Golden sine algorithm: a novel math-inspired algorithm. Adv Electr Comp Eng. 2017; 17: 71–78.
- 54. Graczyk-Kucharska M., Özmen A., Szafrański M., et al. Knowledge accelerator by transversal competences and multivariate adaptive regression splines. Cent Eur J Oper Res. 2020; 28 (2): 645–669.
- 55. Savku E, Weber G W. Stochastic differential games for optimal investment problems in a Markov regime-switching jump-diffusion market. Ann Oper Res. 2020; 1–26.
- 56. Shenoy A. K. B. & Pai S. N. Search graph magnification in rapid mixing of markov chains associated with the local search-based metaheuristics. Mathematics. 2022; 10, 47.
- 57. Seyyedabbasi A, Kiani F. I-GWO and Ex-GWO: improved algorithms of the Grey Wolf Optimizer to solve global optimization problems. Eng Comput. 2021; 37(1): 509–532.
- 58. Carrasco J., García S., Rueda M., et al. Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review. Swarm Evol Comput. 2020; 54: 100665.
- 59. Derrac J., García S., Hui S., et al. Analyzing convergence performance of evolutionary algorithms: A statistical approach. Inf. Sci. 2014; 289: 41–58.
- 60. Derrac J., García S., Molina D. & Herrera F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput. 2011; 1: 3–18.
- 61. Wang J., Li X., Jin L., Li J., Sun Q. & Wang H. An air quality index prediction model based on CNN-ILSTM. Sci. Rep. 2022; 12: 1–16.
- 62. Tzanetos A, Blondin M. A qualitative systematic review of metaheuristics applied to tension/compression spring design problem: Current situation, recommendations, and research direction. Eng Appl Artif Intell. 2023; 118: 105521.
- 63. Huang C, Zhou X, Ran X, et al. Co-evolutionary competitive swarm optimizer with three-phase for large-scale complex optimization problem. Information Sciences. 2023; 619: 2–18.
- 64. Dev S., Massera E., Piga A., Martinotto L. & Di G. On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensor Actuat B-Chem. 2008; 129: 750–757.
- 65. Cortez P., Cerdeira A., Almeida F., Matos T. & Reis J. Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst. 2009; 47: 547–553.
- 66. Cassotti M., Consonni V., Mauri A. & Ballabio D. Validation and extension of a similarity-based approach for prediction of acute aquatic toxicity towards Daphnia magna. SAR and QSAR Environm Res. 2014; 25: 1013–1036. pmid:25482581
- 67. Cassotti M., Ballabio D., Todeschini R. & Consonni V. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas). SAR and QSAR Environm Res. 2015; 26: 217–243.
- 68. Lacagnina G. et al. Leading edge serrations for the reduction of aerofoil self-noise at low angle of attack, pre-stall and post-stall conditions. Int J Aeroacoust. 2021; 20: 130–156.
- 69. Yeh I. C., Hsu T. K. Building real estate valuation models with comparative approach through case-based reasoning. Appl Soft Comput. 2018; 65: 260–271.
- 70. Pu L., Li Y., Gao P., Zhang H. & Hu J. A photosynthetic rate prediction model using improved RBF neural network. Sci. Rep. 2022; 12: 9563. pmid:35688825
- 71. Wang W., Dai S., Zhao W. & Wang C. Optimal design of variable gradient tube under axial dynamic crushing based on hybrid TSSA–GRNN method. Struct Multidiscip Optim. 2022; 65: 11.
- 72. Liang B, Wang S, Huang Y, et al. F-LSTM: FPGA-Based Heterogeneous Computing Framework for Deploying LSTM-Based Algorithms. Electronics. 2023; 12(5): 1139.