A hybrid Q-learning sine-cosine-based strategy for addressing the combinatorial test suite minimization problem

The sine-cosine algorithm (SCA) is a new population-based meta-heuristic algorithm. In addition to exploiting sine and cosine functions to perform local and global searches (hence the name sine-cosine), the SCA introduces several random and adaptive parameters to facilitate the search process. Although it shows promising results, the search process of the SCA is vulnerable to local minima/maxima due to the adoption of a fixed switch probability and the bounded magnitude of the sine and cosine functions (from -1 to 1). In this paper, we propose a new hybrid Q-learning sine-cosine- based strategy, called the Q-learning sine-cosine algorithm (QLSCA). Within the QLSCA, we eliminate the switching probability. Instead, we rely on the Q-learning algorithm (based on the penalty and reward mechanism) to dynamically identify the best operation during runtime. Additionally, we integrate two new operations (Lévy flight motion and crossover) into the QLSCA to facilitate jumping out of local minima/maxima and enhance the solution diversity. To assess its performance, we adopt the QLSCA for the combinatorial test suite minimization problem. Experimental results reveal that the QLSCA is statistically superior with regard to test suite size reduction compared to recent state-of-the-art strategies, including the original SCA, the particle swarm test generator (PSTG), adaptive particle swarm optimization (APSO) and the cuckoo search strategy (CS) at the 95% confidence level. However, concerning the comparison with discrete particle swarm optimization (DPSO), there is no significant difference in performance at the 95% confidence level. On a positive note, the QLSCA statistically outperforms the DPSO in certain configurations at the 90% confidence level.

called the Q-learning sine-cosine algorithm (QLSCA).Within the QLSCA, we eliminate the switching probability.Instead, we rely on the Q-learning algorithm (based on the penalty and reward mechanism) to dynamically identify the best operation during runtime.Additionally, we integrate two new operations (Lvy flight motion and crossover) into the QLSCA to facilitate jumping out of local minima/maxima and enhance the solution diversity.To assess its performance, we adopt the QLSCA for the combinatorial test suite minimization problem.Experimental results reveal that the QLSCA is statistically superior with regard to test suite size reduction compared to recent state-of-the-art strategies, including the original SCA, the particle swarm test generator (PSTG), adaptive particle swarm optimization (APSO) and the cuckoo search strategy (CS) at the 95% confidence level.However, concerning the comparison with discrete particle swarm optimization (DPSO), there is no significant difference in performance at the 95% confidence level.On a positive note, the QLSCA statistically outperforms the DPSO in certain configurations at the 90% confidence level.

Introduction
An optimization problem relates to the process of finding the optimal values for the parameters of a given system from all possible values with minimum or maximum profitability.In past decades, many meta-heuristic algorithms have been proposed in the scientific literature (these include genetic algorithms [1], particle swarm optimization [2], simulated annealing [3], and the bat algorithm [4]) to address such a problem.The sine-cosine algorithm (SCA) is a new population-based meta-heuristic algorithm proposed by Mirjalili [5].In addition to exploiting the sine and cosine functions to perform a local and global search (hence the name sine-cosine), the SCA introduces several random and adaptive parameters to facilitate the search process.Although it shows promising results, the balanced selection of exploration (roaming the random search space on the global scale) and exploitation (exploiting the current good solution in a local region) appears problematic.
Mathematically, sine and cosine are the same operator with a 90-degree phase shift.Therefore, in some cases, the use of either sine or cosine could inadvertently promote similar solutions.Furthermore, the search process is potentially vulnerable to local minima/maxima due to the adoption of a fixed switch probability and the bounded magnitude of the sine and cosine functions (from -1 to 1).
Motivated by these challenges, we propose a new hybrid Q-learning-based sine-cosine strategy called the QLSCA.Hybridization can be the key to further enhancing the performance of the original SCA.Within the QLSCA, we eliminate the switching probability.Instead, we rely on the Q-learning algorithm (based on the penalty and reward mechanism [6]) to dynamically identify the best operation during runtime.Additionally, we combine the QLSCA with two new operations (Lvy flight motion and crossover) to facilitate jumping out of local minima/maxima and enhance the solution diversity.To assess its performance, we adopt the QLSCA for the combinatorial test suite minimization problem.Experimental results reveal that the QLSCA exhibits competitive performance compared to the original SCA and other meta-heuristic algorithms.
Our contributions can be summarized as follows: • A new hybrid Q-learning sine-cosine based strategy called the Q-learning sine-cosine algorithm (QLSCA) that permits the dynamic selection of local and global search operations based on the penalty and reward mechanism within the framework of the Q-learning algorithm.
• A hybrid of Lvy flight (originated in the cuckoo search algorithm [7]) and crossover (originated in genetic algorithms [1]) operations within the QLSCA.
• A comparison of the performance of the QLSCA and that of recent state-of-the-art strategies (including the particle swarm test generator (PSTG) [8], DPSO [9], APSO [10], and CS [11]) for the t-way test minimization problem.

Covering Array Notation
The generation (and minimization) of combinatorial test suites from both practical and theoretical perspectives is currently an active research area.Theoretically, the combinatorial test suite depends on a well-known mathematical object called the covering array (CA).Originally, the CA gained more attention as a practical alternative to the oldest mathematical object, the orthogonal array (OA), which had been used for statistical experiments [12,13].
An OA λ (N ; t, k, v) is an N × k array, where for every N × t sub-array, each t − tuple occurs exactly λ times, where λ = N/v t ; t is the combination strength; k is the number of input functions (k t); and v is the number of levels associated with each input parameter of the software-under-test (SUT) [14].Practically, it is very hard to translate these firm rules except in small systems with few input parameters and values.Therefore, there is no significant benefit for medium-and large-scale SUT because it is very hard to generate OAs.In addition, based on the rules mentioned above, it is not possible to represent the OA when each input parameter has different levels.
To address the limitation of the OA, the CA was introduced.A CA λ (N ; t, k, v) is an N × k array over (0, • • • , v − 1) such that every t − tuple is λ-covered and every N × v sub-array contains all ordered subsets of size t of v values at least λ times, where the set of columns is [15,16].In this case, each tuple appears at least once in the CA.In summary, any covering array CA(N ; t, k, v) can also be expressed as CA(N ; t, v k ).
Variations in the number of component can be handled by a mixed covering array (M CA) [17].
where the rows of each N × t sub-array cover all t interactions of values from the t columns that occur at least once.For more flexibility in the notation, the array can be represented by M CA(N ;

Motivating Example
To illustrate the use of the CA for t−way testing, consider the hypothetical example of an Acme Vegetarian Pizza Ordering System.Referring to Fig 1 , the system offers four selections of parameters: Pizza Size, Spicy, Extra Cheese, and Mayonnaise Topping.One parameter takes three possible values (Pizza Size = { Large Pizza, Medium Pizza, and Personal Pizza}), while the rest of the parameters take two possible values (Spicy = { True, False}, Extra Cheese = { True, False}, and Mayonnaise Topping = { True, False}).Ideally, an all-exhaustive test combination requires 3 × 2 × 2 × 2 = 24 combinations.In a real-life testing scenario, the number of parameters and values can be enormous, resulting in a potentially large number tests.Given tight production deadlines and limited resources, test engineers can resort to a t-wise sampling strategy to minimize the test data for systematic testing.In the context of the Acme Vegetarian Pizza Ordering System highlighted earlier, the mixed-strength CA representation for M CA(N ; 2, 3  given mixed-strength covering arrays.Ideally, the selection of the previously mentioned (mixed) CA representation depends on the product requirements and the creativity of the test engineers based on the given testing.
Mathematically, the t − way test generation problem can be expressed by: where f (Z) is an objective function (or the fitness function); Z is the test case candidate, which is the set of decision variables Z i ; V IL is the set of noncovered interaction tuples(I); the vertical bars | | represent the cardinality of the set and the objective value is the number of non-covered interaction tuples covered by Z; P i is the set of possible values for each decision variable, ; N is the number of decision variables (here the parameters); and K is the number of possible values for the discrete variables.

Related Work
As part of the general interest in search-based software engineering (SBSE) approaches [18], much research attention has been given to the application of meta-heuristic search techniques to address the combinatorial test generation problem.Meta-heuristic techniques have had a big impact on the construction of t − way and variable-strength test suites, especially in terms of the optimality of the test suite [19,20,21,22,23] .
Meta-heuristic based strategies often start with a population of random solutions.Then, one or more search operators are iteratively applied to the population to improve the overall fitness (greedily covering the interaction combinations).While there are many variations, the main difference between meta-heuristic strategies is based on each search operator and how exploration and exploitation are manipulated.
Cohen et al. [16,24] developed a simulated annealing-based strategy for supporting the construction of a uniform and variable-strength t − way test suite.A large random search space is generated in the implementation.When the algorithm iterates, the strategy chooses better test cases to construct the final test suite using the binary search process and a transformation equation.The search space is transformed from one state to another according to a probability equation.The results of the study are mainly concerned with the interaction strengths of two and three [24].
Chen et al. [25] implemented a t − way strategy based on ant colony optimization (ACO).The strategy simulates the behaviour of natural ant colonies in finding paths from the colony to the location of food.Each ant generates one candidate solution and walks through all paths in this solution by probabilistically choosing individual values.When the ant reaches the end of the last path, it returns and updates the initial candidate solution accordingly.This process continues until the iteration is complete.The final test case is chosen according to the maximum coverage of the t-interaction.Unlike the SA, the final test suite is further optimized by a merging algorithm that tries to merge the test cases.
Shiba et al. [26] adopted a genetic algorithm (GA) based on natural selection.Initially, the GA begins with randomly created test cases called chromosomes.These chromosomes undergo crossover and mutation until the termination criterion is met.In each cycle, the best chromosomes are probabilistically selected and added to the final test suite.
Alsewari et al. [19] developed a t − way strategy based on the harmony search algorithm (HSS).The HSS mimics the behaviour of musicians trying to compose good music either by improvising on the best tune they remember or by random sampling.In doing so, the HSS iteratively exploits the harmonic memory to store the best solution found through a number of defined probabilistic improvisations within its local and global search processes.In each improvisation, one test case is selected for the final test suite until all the required interactions are covered.The notable feature of the HSS is that it supports constraints using the forbidden tuple approach.
Ahmed et al. [11] adopted the cuckoo search algorithm (CS), which mimics the unique lifestyle and aggressive reproduction strategy of the cuckoo.First, the CS generates random initial eggs in host nests.Each egg in a nest represents a vector solution that represents a test case.In each generation, two operations are performed.Initially, a new nest is generated (typically through a Lvy flight) and compared with the existing nests.The new nest replaces the current nest if it has a better objective function.Then, the CS adopts probabilistic elitism to maintain the elite solutions for the next generation.
Particle swarm optimization (PSO) [2] is perhaps the most popular implementation of t − way test suite generation.The PSO-based t − way strategy searches by mimicking the swarm behaviour of flocking birds.In PSO, global and local searches are guided by the inertia weight and social/cognitive parameters.Initially, a random swarm is created.Then, the PSO algorithm iteratively selects a candidate solution within the swarm to be added to the final test suite until all interaction tuples are covered (based on velocity and displacement transformation).Ahmed et al. developed early PSO-based strategies called the PSTG [8,27] and APSO [10].APSO is an improvement on the PSTG integrated with adaptive tuning based on the Mamdani fuzzy inference system [28,29].Wu et al. implemented discrete PSO [8] by substantially modifying the displacement and velocity transformation used in PSO.The benchmark results of DPSO [9] demonstrate its superior performance when compared with both the PSTG and APSO.
Despite the significant number of proposed algorithms in this field, the adoption of new meta-heuristic algorithms is most welcome.The no free lunch (NFL) theorem [30] suggests that no single meta-heuristic algorithm can outperform others even when there is a slight change in the problem of (t − way) configurations.Therefore, the NFL theorem allows researchers to propose new algorithms or modify current ones to enhance the current solution.In fact, the results could also be applied in other fields.
Hybrid integration with machine learning appears to be a viable approach to improving the state-of-the-art meta-heuristic algorithms.Machine learning relates to the study of the fundamental laws that govern the computer learning process concerning the construction of systems that can automatically learned from experience.Machine learning techniques can be classified into three types: supervised, unsupervised, and reinforcement [31].Supervised learning involves learning a direct functional input-output mapping based on some set of training data and being able to predict new data.Unlike supervised learning, unsupervised learning does not require explicit training data.Specifically, unsupervised learning involves learning by drawing inferences (e.g., clustering) from an input dataset.Reinforcement learning allows mappings between states and actions to maximize reward signals using experimental discovery.This type of learning differs from supervised learning in that it relies on a punishment and reward mechanism and never corrects input-output pairs (even when dealing with suboptimal responses).
Combinatorial test suite minimization is one of the crucial elements of an efficient test design [32,33].This area is worth further exploration, especially when we can take advantage of machine learnings benefits.To be specific, our approach focuses on the hybridization of a meta-heuristic algorithm with reinforcement learning based on the Q-learning algorithm [6].The Q-learning algorithm is attractive due to its successful adoption in many prior works.Ant-Q [34] is the first attempt by researchers to integrate a meta-heuristic algorithm (ACO) with Q-learning.Although its integration with Q-learning is useful (e.g., it has been successfully adopted for the 2-dimensional cutting stock problem [35] and the nuclear reload problem [36]), the approach appears too specific to ACO because pheromones and evaporation are modelled as part of the Q-learning updates (as rewards and punishments).In a more recent study, RLPSO [37], a PSO algorithm integrated with Q-learning, was successfully developed (and adopted in a selected case study of a gear and pressure vessel design problem and standard benchmark functions).While it has merit, the approach is computationally intensive and complex because each particle in the swarm must carry its own Q-metrics.Therefore, the RLPSO approach is not sufficiently scalable for large-scale combinatorial problems requiring large population sizes.In the current study, the swarm size is limited to 3.
By building on and complementing the work mentioned above, our work explores the hybridization of the Q-learning algorithm with a recently developed meta-heuristic algorithm called the SCA [5].Unlike most meta-heuristic algorithms that mimic certain physical or natural phenomena, the equation transformation used in the SCA is solely based on the sine and cosine operations.Therefore, the learning curve of the SCA is low.Although its exploitation is commendable, the exploration of the SCA is strictly bounded due to the (adaptive) shrinking magnitude of the sine and cosine functions multipliers during the search process.To address the issues mentioned above, we propose a new algorithm, the QLSCA.Moreover, we augment the QLSCA with two further operations (Lvy flight motion and crossover) to counterbalance its exploration and exploitation.Then, we use the Q-learning algorithm (which is based on the penalty and reward mechanism) to dynamically identify on the best operation (sine, cosine, Lvy flight motion, or crossover) during runtime.

Overview of the SCA
The SCA is a population-based meta-heuristic algorithm [5].As the name suggests, the SCA exploits the sine and cosine functions to update its populations positions.Each position is treated as a vector.To be specific, the positions are updated is based on: where X t i is the position of the current solution in the i th dimension and the t th iteration; r 1 , r 2 , r 3 , and r 4 are random numbers in [0,1]; P i is the position of the best destination point in the i th dimension, and | | indicates the absolute value.
Due to its importance to the exploration and exploitation of the SCA, the four main parameters r 1 , r 2 , r 3 , and r 4 require further elaboration.The parameter r 1 dictates the radius of the search circle (displacement size).It is also possible to adaptively and dynamically vary r 1 during the iteration process using: where t is the current iteration; T is the maximum number of iterations; and M is a constant.Due to the cyclic nature of sine and cosine, the parameter r 2 defines whether the motion is inward (the direction of exploitation when sine and cosine are negative) or outward (the direction of exploration when sine and cosine are positive), as can be seen in Fig 3 .The parameter r 3 brings in the random weight from the best position to affect the overall displacement from the current position.Finally, the parameter r 4 equally switches between the sine and cosine components.
To summarize, the general pseudo code for the SCA algorithm is given in Algorithm 1.

The Proposed Strategy
The proposed QLSCA-based strategy integrates Q-learning with the sine and cosine operations, Lvy flight motion and crossover.Lvy flight and crossover were selected for two reasons.Firstly, the Lvy flight operator is a well-known global search operator [38].Activating the Lvy flight operator can potentially propel the search process from a local optimum.Secondly, the crossover can be considered both global and local searching [1].For instance, 1-or 2-point crossover can be regarded as local searching.However, crossover at more than 2 points is essentially global searching.Such flexible behaviour balances the intensification and diversification of the QLSCA.
Having justified the adoption of Lvy flight motion and crossover, the detailed explanation of the proposed QLSCA is as follows: Algorithm 1: Pseudo Code for SCA Algorithm [5] Input: Initialize random population X while (stopping criteria not met) do Set initial r 1 using Eq.3 for iteration = 1 till max iteration do for population count =1 to population size do

Evaluate each population of X by the objective function
Update the best solution obtained so far, Update the position of X using Eq.2 or Eq.3 Update r 1 adaptively using Eq.4 11 Return the updated population, X and the best result (X best )

Q-Learning Algorithm
The Q-learning algorithm [6] learns the optimal selection policy by interacting with the environment.The algorithm works by estimating the best state-action pair through the manipulation of a Q(s, a) table.To be specific, a Q(s, a) table uses a state-action pair to index a Q value (as a cumulative reward).The Q(s, a) table is dynamically updated based on the reward and punishment (r) of a particular state-action pair.
The optimal setting for α t , γ, and r t within the Q-learning algorithm requires further clarification.When α t is close to 1, higher priority is given to the newly gained information for the Q-table updates.However, a small value of α t gives higher priority to existing information.To facilitate exploration of the search space (to maximize learning from the environment), α t can be set to a high value during early iterations and adaptively reduced in later iterations (to exploit the current best Q-value).This process is as follows: The parameter γ functions as a scaling factor for rewarding or punishing the Q-value based on the current action.When γ is close to 0, the Q-value is based solely on the current reward or punishment.When γ is close to 1, the Q-value is based on the current and previous reward and/or punishment.The literature suggests setting γ = 0.8 [37].
The parameter r t serves as the actual reward or punishment.In our current study, the value of r t is set based on: Summing up, the pseudo code of the Q-learning algorithm is illustrated in Algorithm 2.

Lvy Flight Motion
To complement the sine and cosine operations within the SCA and ensure that the developed QLSCA can jump out of local minima, we propose Algorithm 2: Pseudo Code for the Q-Learning Algorithm  Mathematically, the step length of a Lvy flight motion can be defined as follows [7]: where u and v are approximated from a normal Gaussian distribution in which: For v value estimation, we use σ v = 1.For u value estimation, we evaluate the Gamma function(Γ) [40] with the value of β = 1.5 [38], and obtain σ u using: A lvy flight motion displacement update (with exclusive OR operation ⊕) is then defined as:

Crossover Operation
The crossover operation is derived from GAs. Ideally, crossover is a local search operation whereby two distinct populations X i and X j exchange their partial values based on some random length β.Visually, crossover is represented in Fig 5.
The displacement due to crossover can be updated in the following three steps: 5.4.The QLSCA Algorithm Exploration and exploitation are the key components of reinforcement learning algorithms (such as Q-learning).Exploration is necessary to understand the long-term rewards and punishment to be used later during exploitation of the search space.Often, it is desirable to explore during the early iterations.During later iterations, it is then desirable to favour exploitation.To achieve such an effect, Mauzo et al. suggest adopting the Metropolis probability function criterion [41], which is mainly used in simulated annealing.Alternatively, a more straightforward probability criterion with a similar effect (decreasing over time) can be defined as: iteration go to exploration mode Otherwise, go to exploitation mode (13) Our early experiments with the Metropolis probability function indicate no significant performance difference with the probability given in Eq. 13.Furthermore, the Metropolis probability functions exploitation of the current and previous values can be problematic, since Q-learning is a Markov decision process that relies on the current and forward-looking Q-values.For these reasons, we do not favour Metropolis-like probability functions.
To ensure that the learning is adequate (i.e., the roaming of the search space is sufficient), the QLSCA updates the Q-table for one complete episode cycle (in some random order) for each exploration opportunity.To support the use of 4 search operators within the QLSCA (sine, cosine, Lvy flight and crossover), the Q-table needs to be constructed as a 44 matrix in which the rows represent the state (s t ) and the columns represent the action (a t ) for each state.Fig 6 depicts a snapshot of the Q-table for the QLSCA along with a numerical example.Assume that the current state-action pair is s t = Sine Operator and has a t = Cosine Operator.The search process selects one of the four operators (sine, cosine, Lvy flight, and crossover) as the next action (a t ) based on the maximum reward defined in the state-action pair within the Q-table.This is unlike the original SCA algorithm in that the cosine or sine operator is selected based on the probability parameter, r 4 .
Referring to Fig 6, we assume that the settings are as follows: the current value stored in the Q-table for the current state is Q (t+1) (s t , a t ) = 1.22 (grey circle in Fig 6 ); the reward is r t = −1.00; the discount factor is γ = 0.10; and the current learning factor is a t = 0.70.Then, the new value for Q (t+1) (s t , a t ) in the Q-table is updated based on Eq. 4 as follows: The state is then changed from sine to cosine.Similarly, the action a t = Cosine Operator is changed to Lvy flight.It should be noted that during both the exploration and exploitation of Q-value updates, the meta-heuristic QLSCA search process continues in the background.In other words, for each update, X best is kept and the population X is updated accordingly.Finally, based on the adoption of Lvy flight with sporadic long jumps, the positional update may sometimes encounter out-of-range values.Within the QLSCA, we establish a clamping rule to apply lower and upper bounds to parameter values.In this way, when X j moves out of range, the boundary condition brings it back into the search space.There are at least three types of boundary condition: invisible walls, reflecting walls, and absorbing walls [42].With invisible walls, when a current value goes beyond the boundary, the corresponding fitness value is not computed.With reflecting walls, when a value reaches the boundary, its value is reflected back to the search space (mirroring effects).With absorbing walls, when a current value moves out of range, the boundary condition brings it back into the search space by moving it to another endpoint.For example, if the position of a parameter limited to the range from 1 to 4, then, when the position exceeds 4, it is reset to 1.For our QLSCA implementation, we favour absorbing walls for our clamping rule.
To summarize, the complete QLSCA algorithm can be described in three main steps (Step A: Initialization, Step B: Exploration to Update the Q-table, and Step C: Exploitation to Update the Q-table), as shown in Algorithm 3. As the name suggests, Step A involves initialization.Step B includes a complete update of the state-action pair update in 1 cycle in random order.Finally, Step C updates the currently selected state-action pair.
Algorithm 3: Pseudo Code for the QLSCA Algorithm Having described the QLSCA algorithm, the following section outlines its use in addressing the t − way test suite generation problem.In general, the QLSCA is a composition of two main algorithms: (1) an algorithm for generating interaction tuples that generates combinations of parameters that are used in the test suite generator for optimization purposes, and (2) a QLSCA-based test suite generator algorithm.In the next sections, these two algorithms are detailed.

Interaction Tuples Generation Algorithm
The interaction tuples generation algorithm involves generating the parameter (P ) combinations and the values (v) for each parameter combination.The parameter generation processes use binary digits: 0 indicates that the corresponding parameter is excluded and 1 indicates that it is included.
Consider an example involving M CA(N ; 2, 2 3 3 1 ), as shown in Fig 7 .This configuration requires a 2 − way interaction for a system of four parameters.First, the algorithm generates all possible binary numbers with up to four digits because there are 4 parameters.From these possibilities, the binary numbers that contain two 1s are selected; these indicate that there is a pairwise interaction of parameters (t = 2).For example, the binary number 1100 refers to a P 1 P 2 interaction.P 1 has two values (0 and 1), P 2 has two values (0 and 1), P 3 has two values (0 and 1), and P 4 has three values (0, 1, and 2).The 2 − way parameter interaction has six possible combinations based on the parameter generation algorithm.For combination 1001, in which P 1 and P 4 are available, there are 2 × 3 possible interactions between P 1 and P 4 .For each parameter in the combination (with two 1s), the value of the corresponding parameter is included in the interaction elements.In this example, the excluded values are marked as do not matter.This process is repeated for the other five interactions, (P 1 , P 2 ), (P 1 , P 3 ), (P 2 , P 3 ), (P 2 , P 4 ), and (P 3 , P 4 ).
To ensure efficient indexing for storage and retrieval, we opted to implement am interaction tuple hash table (H s ) that uses the binary representation of the interaction as the key.The complete algorithm for generating the interaction elements is highlighted in Algorithm 4.

Test Suite Generation Algorithm
The principle underlying the QLSCA-based strategy is highlighted in Fig 6 .Nevertheless, to apply the general QLSCA to the t − way test generation Put p into the hashmap, H s , using the hash key 15 Return H s problem, three adaptations must be made.The first adaptation involves the input parameters.To cater to the t−way problem, the QLSCA needs to process the parameters (k), the values (v) of each parameter, and the interaction strength (t).Based on these values, the interaction tuples can be generated.
The second adaptation is based on the way the population is represented within the QLSCA.The t − way test generation problem is a discrete combinatorial problem.Therefore, the QLSCA initializes the population search space as a D-dimensional integer population X j = {X j,1 , X j,2 , X j,3 , • • • , X j,D ] in which each dimension represents a parameter and contains integers between 0 and (v i ), which is the number of values the i th parameter takes.
Finally, the third adaptation is to the stopping criterion.When any particular interaction tuple has been covered (and the test case covering those tuples has been added to the final test suite F s ), then, the tuples are deleted from the interaction tuples list (refer to step 30 in Algorithm 5).Therefore, the QLSCA stops when the interaction tuple is empty (refer to step 5 in Algorithm 5).The complete test suite generation algorithm based on the QLSA is summarized in Algorithm 5.

Experimental Study
Our experiments focus on two related goals: (1) to characterize the performance of the QLSCA in comparison to that of the SCA, and (2) to benchmark the QLSCA and the SCA against other meta-heuristic approaches.
In the second part, we benchmark the sizes of the test suites generated by our SCA and QLSCA against those of existing meta-heuristics based on the benchmark t − way experiments published in [9].To be specific, the benchmark experiments involve the CA(N ; t, 3 k ) with varying t (from 2 to 4) and k (from 2 to 12), the CA(N ; t, v 7 ) and the CA(N ; t, v 10 ) with varying t (from 2 to 4) and v (from 4 to 6).
In the third part, we analyse our results statistically.We intend to determine whether the performance of the QLSCA at minimizing the test suite size is a statistically significant improvement over compared to existing strategies.
Algorithm 5: The QLSCA Strategy  We developed the SCA and the QLSCA using the Java programming language.For all experiments involving the SCA and the QLSCA, we set the population size = 40, max iterations = 100, and the constant M = 3 (refer to Eq. 3) for all the experiments.We execute the QLSCA and the SCA 30 times to ensure statistical significance.Our platform comprises a PC running Windows 10 with a 2.9 GHz Intel Core i5 CPU, 16 GB of 1867 MHz DDR3 RAM and a 512 MB flash HDD.
The best and mean times and sizes (whenever applicable) for each experiment are reported side-by-side.The best cell entries are marked with , while the best mean cell entries are in bold.Unavailable entries are denoted by NA.
To put our work into perspective, we highlight all the parameters for the strategies of interests (the PSTG [8], DPSO [9], APSO [10], and the CS [11]) obtained from their respective publications (as depicted in Table 2).3 7.1.Characterizing the Performance of the SCA and the QLSCA This section highlights the experiments that compare the SCA and the QLSCA with respect to test suite size, execution time, consistency (i.e., the range of variation in the generated results) and convergence patterns of the SCA and the QLSCA.To objectively perform this comparison, both strategies adopt the same parameter settings and are implemented using the same data structure and implementation language.
Table 3 highlights our results for the test size and execution time.

Benchmarking with other Meta-Heuristic based Strategies
Unlike the experiments in the previous section, the benchmark experiments in this section (as adopted from [9]) also include comparisons of the QLSCAs and SCAs performances to those of all other strategies.However, the execution times have been omitted due to differences in the running environment and in the parameter settings (e.g., PSO relies on the population size, the inertia weight, and social and cognitive parameters, while the cuckoo search relies on the elitism probability, number of iterations, and population size) and implementation (e.g., the data structure and the implementation language).Tables 3 to 10 highlight our complete results.

Statistical Analysis
Our statistical analysis of all the obtained results from Tables 3 to 10 is based on the 1 × N pair comparisons.The Wilcoxon rank-sum test is used to assess whether the control strategy provides results that are significantly different from those of the other strategies.To handle FWER errors due to multiple comparisons, we adopted the Bonferroni-Holm [43] correction for adjusting the value of α (based on p holm = α i ) in ascending order.For an i-ordered strategy, the p−value p i is compared with the value of p holm for the same row of the table.In this study, α is set to 0.05 and 0.10 because most strategies are well-tuned and report their known best test suite sizes.Whenever the p−value p i is less than the corresponding value of p holm , the results imply that the test suite is smaller for the QLSCA than for the i-ordered strategy, which means that the QLSCA has a smaller median population.Table 11 summarizes the overall statistical analysis.

Experimental Observation
Reflecting on the experimental results yields a number of observations.Concerning the first set of experiments described in Section 7.1, Table 3 compares the performances of the QLSCA and the SCA in terms of test size and execution time.We note that for all mean test sizes, the QLSCA outperforms the SCA (6 of 6).A similar trend can be seen for the best test size.The QLSCA outperforms the SCA in all cases.On a positive note, the SCA can match the result of the QLSCA for the CA(N ; 3, 4 6 ).As expected, the SCA outperforms the QLSCA in terms of execution time in all cases due to the overhead introduced by the Q-learning algorithm.Arguably, the trade-off between execution time and test size is necessary to ensure efficient, quality tests and to promote cost savings.
The box plot analyses of Table 3 shown in Fig 8 (a)-(f ) reveal a number of salient characteristics of the QLSCA and the SCA search processes.Considering the CA(N ; 2, 3 13 ), even though they have the same quartile bias range (i.e., they incline towards the lower quartile) and the same top-tobottom whisker width, the QLSCA has a lower median than the SCA.For the To cater for FWER errors owing to multiple comparisons, we have adopted the Bonferroni-Holm [43] correction for adjusting α value (i.e. based on p holm = α/i) ordered from ascending order.For an i-ordered strategy, the p-value p i is compared with the value of p holm at the same row of the table.In this study α is set at 0.05 and 0.10 as most strategies are well-tuned and reported their known best test suite sizes.Whenever the p-value p i is lower than the corresponding p holm , the results imply that test suite size for QLSCA is less than i-ordered strategy (i.e.QLSCA has a lower population median).Table 11 summarizes the overall statistical analysis.As for Table 7, the QLSCA contributes 66.66% (4 of 6 entries) as far as the best mean test size is concerned.The other best mean test sizes are shared by APSO and DPSO with 16.66% (1 of 6 entries).DPSO performs the best performance with respect to the best test size with 66.66% (4 of 6 entries).The QLSCA comes in second with 50% (3 of 6 entries).APSO and the CS come in third with 16.66% (1 of 6 entries).The PSTG and the SCA do not contribute as far as the best test size is concerned.
In Table 8, the QLSCA outperforms other strategies with 66.66% (4 of 6 entries) as far as the mean test size is concerned.The PSTG and DPSO share the rest of the best mean test sizes with 16.66% (1 of 6 entries).The other strategies do not contribute towards the best mean test size.A similar observation can be seen as far as the best test size is concerned.The QLSCA outperforms all other strategies with 66.66% (4 of 6 entries).All the other strategies contribute at least 16.66% (1 of 6 entries), except for the PSTG, which contributes 0%.
According to Table 9, DPSO dominates as far as the best mean test size is concerned with 66.66% (4 of 6 entries).The QLSCA and the CS come in second with 16.66% (1 of 6 entries).The SCA and the other strategies do not contribute as far as the best mean test size is concerned.A similar observation can be made in the case of the best test size.DPSO outperforms all the other strategies with 66.66% (4 of 6 entries).The QLSCA and the CS are second and contribute 16.66% (1 of 6 entries).The SCA and the other strategies do not contribute to the best test size.
Concerning the results in Table 10, we observe that the QLSCA outperforms the other strategies with 77.77% (7 of 9 entries) as far as the best mean test size is concerned.DPSO comes in second with 22.22% (2 of 9 entries).Aside from the QLSCA and DPSO, no other strategies contribute towards the best mean test size.Similarly, no strategies contribute towards the best test size apart from the QLSCA and DPSO with 88.88% (8 of 9 entries) and 44.44% (4 of 9 entries), respectively.
Statistical analyses for Tables 3 through 10 (given in Table 11), indicate that the QLSCA statistically dominates all the state-of-the-art strategies at the 90% confidence level.From the statistical analysis for Table 3, the significance of the QLSCA in comparison to the SCA is evident.The analyses for Tables 4 through 6 also support the alternate hypothesis (that the QLSCA performs better than the PSTG, the SCA, and the CS).The contributions of DPSO and APSO are ignored because results are unavailable for some CAs.Referring to the analyses for Tables 7 to 9, the QLSCA is better than the SCA but not DPSO at the 95% confidence level (excluding contributions from the PSTG, APSO, and the CS).However, at the 90% confidence level, the QLSCA is better than DPSO for Tables 7 and Table 10.

Threats to Validity
Normally, most of the research in this field addresses different threats during experiments and evaluations.These threats are to internal and external validity and depend on the type of research.This study is not infallible with respect to these threats.Threats to external validity occur when we cannot generalize experiments to real-world problems.Here, there is no guarantee that the adopted benchmarks represent real-world applications with the same number of parameters and values and the same interaction strength.We have tried to eliminate this threat by choosing the most common and realistic benchmarks in the literature for the experiments.These benchmarks are widely used for evaluations and have been selected from real configurable software or obtained from a simulation of possible configurations.
Threats to internal validity are concerned with the factors that affect the experiments without our knowledge and/or are out of our control.The differ-ences in population size, the number of iterations and parameter settings of each meta-heuristic based strategy are examples of threats to internal validity.Because source code is not available for all implementations, we cannot ensure that the compared strategies have the same number of fitness function evaluations as the QLSCA.Despite these differences, we believe that our comparison is valid because the published test size results are obtained using the best control parameter settings and are not affected by the operating environment.In fact, in addition to the best size results, we relied on the mean results to ascertain the performance of each strategy due to the randomness of each meta-heuristic run.
Another threat to internal validity is the generation time for each strategy.It is well-known that the size of the test suite is not affected by the environment.However, the generation time for the test suite is strongly affected by the running environment.Therefore, we cannot directly compare the generation time with published results.Indeed, to compare the generation time fairly, it is necessary that all strategies be implemented and used in the same environment.In fact, in many cases, the strategies may need to be implemented in the same programming language using the same data structure (in addition to from running for the same number of iterations).
Finally, the choice of unsupervised reinforcement learning based on the Q-learning algorithm may be another threat to internal validity.State-actionreward-state-action (SARSA) [29], a competitor to Q-learning, could also be chosen for the QLSCA.Unlike Q-learning, which exploits look-ahead rewards, SARSA obtains rewards directly from the actual next state.We believe that because most of the time, the look-ahead reward eventually becomes the actual reward (except when there is a tie in the Q-table), the choice between SARSA and Q-learning is immaterial and results in no significant difference in performance.

Concluding Remarks
In this paper, we have described a novel hybrid QLSCA that uses a combination of the sine, cosine, Lvy flight, and crossover operators.Additionally, we have applied the QLSCA to the combinatorial test suite minimization problem as our benchmark case study.
The intertwined relationship between exploration and exploitation in both Q-learning and the QLSCA strategy needs to be highlighted.As far as the Qlearning algorithm is concerned, exploration and exploitation deal with online updating of (learned) Q-table values to identify promising search operators for future selection (using rewards and punishments).Initially, Q-learning favours exploration, but in later iterations, it favours exploitation (using a probabilistic value that decreases over time).Unlike Q-learning, the QLSCAs exploration and exploitation obtain the best possible solutions by dynamically executing the right search operator at the right time.Specifically, the exploration and exploitation of the QLSCA work synergistically with the Qtable .With the help of the Q-table, the QLSCA can eliminate the switch parameter r 4 defined in the SCA (refer to Eq. 1 and Eq. 2).Therefore, the QLSCA, unlike the SCA, can adaptively identify the best operation based on the learned Q-table values.In this manner, the QLSCAs decision to explore or exploit (i.e., choosing the best search operator at any point in the search process) is directly controlled by the learned Q-table values.
Concerning the ensemble of operators, the introduction of crossover and Lvy flight within the QLSCA helps enhance the solution diversity and provides a mechanism for leaving local extrema.In addition to the fixed switching probability and the bounded magnitude of the sine and cosine functions, the fact that the sine and cosine operators are mathematically related (see Eq. 14 and Eq.15) can be problematic.
As far as intensification and diversification are concerned, the use of either sine or cosine may cause the search process to become stuck at a local minimum (because they alternate from -1 to 1).Consequently, the performance of the SCA appears to be poorer than that of the QLSCA (and the other strategies) in almost all cases.In fact, the box plot (see Fig 8) and the convergence pattern analyses (see Fig 9) confirm our observation.On a positive note, the SCA runs much faster than the QLSCA.The introduction of additional operators and Q-learning cuts the execution time in half compared to the original SCA.
Considering the average search operator percentage distribution, we observe the following patterns based on our experiments.
• For uniform CAs, the search operators are almost equally distributed.
In such a situation, the Q-learning mechanism gives each search operation an equal opportunity to undertake the search.
• For non-uniform CAs (MCAs), Q-learning is more inclined towards the crossover operation.Unlike the uniform CAs, the MCAs depend on number of parameter matchings for each test case in the test suite being different.Its ability to flexibly serve for both local and global searches is perhaps the main reason for Q-learning to favour the crossover operation.However, Lvy flight is less preferred by Q-learning because the resulting values are often too extreme and cause out-of-boundary parameter matching.When reflected back inside the boundary, the selected parameter is always reset to the boundary of the other endpoint (as an absorbing wall), which inadvertently promotes less diverse solutions.
In terms of the overall performance, in addition to surpassing the original SCA, the QLSCA has also outperformed many existing strategies by offering the best means in most of the table cell entries (the closest competitor is DPSO).Our statistical analyses support this observation.Putting DPSO aside, when α = 0.05, the QLSCA statistically outperforms the original SCA, the PSTG, APSO and the CS in all configurations given in Tables 3 to  10.When α = 0.10, the QLSCA is statistically better than DPSO in two of four configuration tables (Tables 7 and 10).Therefore, we believe that the QLSCA offers another useful alternative strategy for solving the t − way test suite minimization problem.
The scope of future work includes our current evaluation of applying the QLSCA to other well-known optimization problems (e.g., timetabling and vehicle-routing problems) because of its performance.Additionally, we are investigating the comparative performance of the case-based reasoning approach and fuzzy inference systems with the Q-learning approach for the QLSCA.

Figure 3 :
Figure 3: Effects of Sine and Cosine on Search Radius

Figure 7 : 1 ) 4 : 10 Add dont care value for p[index] 11 if the no of 1s in b == t then 12 Set hash k ey = b 13 Append
Figure 7: The Hash Map and Interaction Tuples for M CA(N ; 2, 2 3 3 1 )

Figure 8 :
Figure 8: Box Plots for Table3 Fig 8 depicts the box plot analysis.Fig 9 highlights the convergence patterns for the best 30 runs for each CA and MCA, while Fig 10 depicts the average percentage distribution for each search operator over all 30 runs.

Figure 10 :
Figure 10: Average Search Operator Percentage Distribution for Table3 1 2 3 ) can be seen in Fig 2 with 7 test cases (a reduction of 70.83% from the 24 exhaustive possibilities).Table 1 highlights the corresponding test cases mapped from the

Table 1 :
Mapping of Mixed Covering Arrays to Test Cases

table Let
s t be the state at a particular instance t Let a t be the action at a particular instance tfor each state S = [s 1 , s 2 , • • • , s n ] and action A = [a 1 , a 2 , • • • , a n ] do Set Qt(st, at) = 0Randomly select an initial state, s t while stopping criteria not met do From the current state s t , select the best action a t from the Q-table Execute action a t and get immediate reward/punishment r t using Eq.7Get the maximum Q value for the next state s t+1 10 Update α t using Eq.611Update Q-table entry using Eq.512Update the current state, s t = s t+113 Return the updated Q(s, a) table Let st be the state at a particular instance t Let at be the action at a particular instance t for each state S = {s 1 , s 2, • • • , sn} and action A = {a 1 , a 2 , • • • , an} do Set Qt(st, at) = 0 Generate interaction tuple list based on the values of t, k, v (refer to Fig 7) Randomly select an initial state, st while interaction tuple list is not empty do for iteration = 1 till max iteration do for population count =1 till population size do Set initial r 1 using Eq.3 Choose Step B or Step C probabilistically based on Eq.12 /* Step B: Exploration for Q-table update Update the current state, st = s t+1 Obtain the best result (X best ) from the updated population, X 6. QLSCA Strategy for t − way Test Suite Generation */ for each state S = {s 1 , s 2 , • • • , sn}, and action A = {a 1 , a 2 , an} in random order ; // loop for 1 episode do From the current state st, select the best action at from the Q-table if action (at) == Sine operation then update X t i using Eq.2 else if action (at) == Cosine operation then update X t i using Eq.3 else if action (at) == Lvy flight motion then update X t i using Eq.11 else if action (at) == Crossover operation then update X t i using Eq.12Execute action at and get immediate reward/punishment rt using Eq.7Get the maximum Q value for the next state s t+1Update at using Eq.6Update Q-table entry using Eq.5 = {s 1 , s 2 , • • • , sn} and action A = {a 1 , a 2 , • • • , an} do Update the current state, st = s t+1 */ for each state S = {s 1 , s 2 , • • • , sn} and action A = {a 1 , a2, • • • , an} in random order ; // loop for 1 episode do From the current state st, select the best action at from the Q-table if action (at) == Sine operation then update X t i using Eq.1 else if action (at) == Cosine operation then update X t i using Eq.2 else if action (at) == Lvy flight motion then update X t i using Eq.10 else if action (at) == Crossover operation then update X t i using Eq.11Execute action at and get immediate reward/punishment rt using Eq.6Get the maximum Q value for the next state s t+1Update at using Eq.5Update Q-table entry using Eq. 4

Table 2 :
Algorithm Parameters for Strategies of Interestsn

Table 3 :
Time and Size Performances for SCA and QLSCA

Table 4 :
Size Performance for CA(N ; 2, 3 k ) where k is varied from 3 to 12

Table 5 :
Size Performance for CA(N ; 3, 3 k ) where k is varied from 4 to 12

Table 6 :
Size Performance for CA(N ; 4, 3 k ) where k is varied from 5 to 12

Table 7 :
Size Performance for CA(N ; 2, v 7 ) where v is varied from 2 to 7

Table 8 :
Size Performance for CA(N ; 3, v 7 ) where v is varied from 2 to 7

Table 9 :
Size Performance for CA(N ; 4, v 7 ) where v is varied from 2 to 7

Table 10 :
Size Performance for CA(N ; t, v 10 ) where t is varied from 2 to 4

Table 11 :
Wilcoxon Rank-Sum Tests for Tables 3 till 10 with QLSCA as control strategy

Table 11 .
Wilcoxon Rank-Sum Tests for Tables 3 till 10 with QLSCA as control strategy Concerning the first part of the experiments elaborated in Section 7.1, Table3depicts the comparative performances of QLSCA and SCA in terms of test sizes and execution time.Here, we noted that for all mean test sizes, QLSCA has outperformed SCA (i.e. 6 out of 6).Similar trend can be seen for best test size.QLSCA outperforms SCA in all cases.On the positive note, SCA is able to match the result of 10 entries) of the best mean test sizes.The PSTG, the CS, and the SCA do not contribute any of the best means.Concerning the best test size, the QLSCA also outperforms other strategies with 90% (9 of 10 entries).Other strategies contribute 50% of the best results (5 of 10 entries), except for the PSTG, which contributes only 30% (3 of 10 entries).In Table5, we observe that the QLSCA has the best mean test size 77.7% of the time (7 of 9 entries).The runner-up is the CS with 22.22% (2 of 9 entries).The SCA and other strategies do not contribute to the best mean test size.Concerning the best test size, the QLSCA has the best performance with 66.66% (6 of 9 entries).DPSO comes in second with 55.55% (5 of 9 entries).APSO comes in third with 33.33% (3 of 9 entries).The CS comes in fourth with 22.22% (2 of 9 entries).Finally, the SCA and the PSTG come in last with 11.11% (1 of 9 entries).Concerning the results in Table6, there is no contribution from other strategies because the QLSCA dominates the best mean test size with 100% (8 of 8).As for the best test size, the QLSCA contributes 87.5% (7 of 8 entries) and DPSO contributes 25% (2 of 8 entries).APSO and the SCA contribute 12.5% (1 of 8 entries).The CS and the PSTG perform the worst with no examples having the best test size.