Sequential Insertion Heuristic with Adaptive Bee Colony Optimisation Algorithm for Vehicle Routing Problem with Time Windows

This paper presents a bee colony optimisation (BCO) algorithm to tackle the vehicle routing problem with time window (VRPTW). The VRPTW involves recovering an ideal set of routes for a fleet of vehicles serving a defined number of customers. The BCO algorithm is a population-based algorithm that mimics the social communication patterns of honeybees in solving problems. The performance of the BCO algorithm is dependent on its parameters, so the online (self-adaptive) parameter tuning strategy is used to improve its effectiveness and robustness. Compared with the basic BCO, the adaptive BCO performs better. Diversification is crucial to the performance of the population-based algorithm, but the initial population in the BCO algorithm is generated using a greedy heuristic, which has insufficient diversification. Therefore the ways in which the sequential insertion heuristic (SIH) for the initial population drives the population toward improved solutions are examined. Experimental comparisons indicate that the proposed adaptive BCO-SIH algorithm works well across all instances and is able to obtain 11 best results in comparison with the best-known results in the literature when tested on Solomon’s 56 VRPTW 100 customer instances. Also, a statistical test shows that there is a significant difference between the results.


Introduction
The vehicle routing problem (VRP) is the generic name given to all types of problems that involve a set of routes for a fleet of vehicles that use one or more depots to serve a geographically delimited town or set of customers. The VRP was proposed by Dantzig and Ramser in 1959 [1]. Its main objective is to serve all customers while imposing a limited demand on the minimum number of vehicles and/or total cost.
One of the extensions of the VRP is the vehicle routing problem with time window (VRPTW), which is concerned with a predefined time interval called the time window. In this scenario, a vehicle can visit a location only in a specified time window. In other words, after the latest time set for the time window, a vehicle is no longer accommodated. Each vehicle starts and ends at a central depot and has a limited capacity. However, if a vehicle arrives at a destination before the earliest stated time, then it must wait until the beginning of the time window.
The main objective of the VRPTW is to service all the requirements of the customers while minimising the total distance travelled by all the vehicles without violating certain constraints. These constraints include the following: (1) each customer is visited only once, (2) the sum of all the demands on a given vehicle must not be more than the maximum capacity of the vehicle, and (3) the vehicle must arrive after the earliest time and visit all the customers within a specific predefined time.
The VRPTW has been tackled by using exact approaches [2,3], heuristic approaches [4][5][6][7][8], and metaheuristic approaches [9,10]. It has also been the subject of population-based approaches, including the particle swarm optimisation (PSO) algorithm [11,12], artificial bee colony (ABC) algorithm [13], genetic algorithm [14], and ant colony optimisation (ACO) [15]. The present study uses an algorithm based on an adaptive bee colony optimisation (BCO) algorithm. The BCO algorithm was introduced by Teodorovic et al. [16]. Some experiments that tackle Solomon instances of the VRPTW have been presented. A new approach has also been proposed to improve the effectiveness and robustness of the BCO algorithm using the online (self-adaptive) parameter tuning strategy. In that work, two strategies are used for parameter tuning: offline parameter initialisation and an online parameter tuning strategy. The values of different parameters in offline parameter initialisation are fixed before the execution of the metaheuristic, whereas the values of the parameters in the online approach are controlled and updated dynamically or adaptively during the execution of the metaheuristic [17].
A review of the literature shows that the initial population used in most of the cases is generated randomly or through a greedy approach; few researchers focus on generating an initial population, where diversification plays a crucial role in the performance of population-based algorithms [17]. Therefore, we propose a sequential insertion heuristic (SIH) to achieve this diversification. The experimental results show that the online parameter tuning strategy (adaptive BCO algorithm) improves the results of the BCO algorithm. In addition, the adaptive BCO algorithm performs better when SIH is used for its initial population than when the greedy heuristic approach is used. A comparison with other state-of-the-art approaches indicates that our adaptive BCO algorithm obtains the best results in terms of achieving the main objective of the VRPTW, which is to reduce the distance travelled.
The remainder of this paper is organised as follows: Section 2 presents the VRPTW formulations. Section 3 discusses the Solomon benchmark datasets for the VRPTW. Section 4 proposes the BCO algorithm for the VRPTW. Section 5 presents the population initialisation strategy. Section 6 discusses the experimental and comparative results. Section 7 concludes the paper.

VRPTW Formulations
This section presents a mathematical formulation of the VRPTW. We start with a natural understanding of the problem before arriving at a precise formulation. The VRPTW constraints contain a set of matching vehicles, an essential depot node, a specific number of nodes of scattered customers, and a network connecting the customers to the depot. Fig 1 shows a graphic representation of a simple VRPTW and its solution. R1 and R2 represent the two routes, and every number in the network represents a customer. The arrows connect all the customers, and every vehicle must service them by starting from and finishing at the depot.
Based on Solomon's 1987 model, our formulation presupposes N+1 customer and K vehicles. Customer 0 is located at the depot node, and each arc in the network represents a connection between two nodes and specifies the direction of travel. Every route starts from the depot, passes through all the customers, and then returns to the depot. Every vehicle represents one route in the network.
Cost c ij and travel time t ij are related to every arc in the network. In Solomon's 56 VRPTW 100 customer instances, the speed of the vehicle is assumed to be unitary; that is, only one unit of time is spent to travel one unit of distance.
Every vehicle has the same capacity q k , and each customer with demand m i must be visited only once by one of the vehicles. The summation of the capacity of all the demands on the route must be equal to or less than the q k of vehicle k on the route being travelled and no vehicles should be overloaded.
The time window constraints are indicated by a predefined time interval, an earliest arrival time (e i ), and a latest arrival time (l i ). The vehicle must reach the customer before the latest arrival time; if it arrives before the earliest arrival time, then it must wait. A service time for each customer for a given loading/unloading time is also considered. Solomon assumes this loading/unloading time to be unique regardless of the quantity of the demands. Vehicles are assumed to finish their route within the total route time, which is essentially the time window. The VRPTW has the following variables and parameters: The explanations for the VRPTW formulation are as follows: Subject the equation to Eq 1 is the main objective function of the VRPTW problem. Constraint (2) indicates the maximum K routes or vehicles going out of the depot. Eq 3 specifies that every route must start and finish at the main depot. Eqs 4 and 5 ensure that every customer is visited only once by exactly one vehicle. Eqs 6 and 7 define the capacity and maximum travel time constraints, respectively. Constraints (8), (9), and (10) explain the time window. Eq 11 calculates the cost of travel, where i x is the coordinate x for customer I and i y is the coordinate y for customer i.

Solomon Benchmark Datasets for the VRPTW
In 1987, Solomon proposed six benchmark categories for the VRPTW (56 VRPTW 100 customer instances), which have since been widely used for different heuristics [6]. The present study uses these instances to test the performance of the proposed algorithms. The instances differ in the total number of vehicles, travelling time, vehicle capacity, customer location, and suitable time of service. In other words, every customer has their own time window, location (given as x and y coordinates), quantity of demand, ready time, due time, and service time needed.
All instances presuppose 100 customers and a travelling time between customers that is equal to the corresponding Euclidean distance. The 56 instances are divided into six categories based on the pattern of the customer locations and time windows. These categories are named C1, C2, R1, R2, RC1, and RC2. Category C consists of clustered customers, category R consists of remotely located customers, and category RC consists of a mix of remotely located and clustered customers.
The geographical allocation determines the travelling distance between customers. Customers in the cluster category are located closer to one another, and their travelling distance is shorter than in other categories. In contrast, customers in the remote category are located far from one another, such that their traveling distance is longer than in the category C instances. Category C instances are by nature easy to solve because of the short distances between the customers, whereas obtaining a perfect sequence of customers along each route in the case of the R category instances is difficult.
In The instances are further categorised according to the time window constraints. The instances in categories C1, R1, and RC1 have a relatively short time window, whereas the instances in categories C2, R2, and RC2 have a relatively long time window. The time windows for instances R1 and RC1 are randomly produced.
The BCO algorithm, which was introduced by Teodorovic et al. [16] as a new field of swarm intelligence, simulates the foraging behaviour of honeybees in nature. The algorithm has two main passes: the forward pass and backward pass. In the forward pass (denoted by the neighbourhood structure in our method), all the bees fly to the field and explore the search space by exchanging information about food sources, after which they return to the hive with the collected amount of nectar (denoted by distance in our method). In the backward pass, each bee chooses to continue its own exploration either as a recruiter or as a follower based on the information collected during the forward pass. When a bee becomes a recruiter, it continues its search based on the available information; conversely, when a bee becomes a follower, it selects one recruiter to guide its search process (by a roulette wheel selection) [22][23][24][25]. Fig 3 shows the pseudo-code for the BCO algorithm, which starts with the initialisation of the parameters, the values of which must be set before the algorithm is executed (offline strategy). A random solution is created for each bee in the hive (the number of bees is equal to the size of the population) as follows: All the routes are taken as empty, and the number of routes is equal to the maximum number of vehicles. A random route is chosen for all the customers, one that satisfies all the hard constraints. If the placement of all the customers is feasible on the routes, then the solution is returned; if not, then customers are randomly added or removed from the routes until a feasible solution is found.
In the forward pass, all bees in the hive start their own search without any information by applying a number of random neighbourhood structures (NBS). Then, each bee keeps the best solution it finds during the search.
In the backward pass, all bees return to the hive, and then each bee decides to continue its own exploration either as a recruiter or as a follower based on the information it collected from the forward pass stage. This decision is taken with the probability shown in Eq 2, in which exploration as a recruiter leads to a higher probability of better solutions being found.
where SN = population size and fi is a fitness function of the i th solution.

Adaptive BCO Algorithm
In the adaptive BCO algorithm, each bee in the backward pass chooses to continue its own exploration either as a recruiter or as a follower depending on the value of an unimproved counter (Fig 4). The counter increases when the solution does not improve even after a number of neighbourhoods are applied on the forward pass. Solutions with the highest value of unimproved moves are inserted in a previously defined list. The size of the list is chosen based on the primary experiment that defines 10 as the maximum number of follower bees, which in turn is based on the list (List). Solutions in the list are considered as the follower bees and the rest are the recruiters. The BCO algorithm uses the offline parameter setting, whereas the adaptive BCO algorithm uses the online parameter tuning strategy and the unimproved counter parameter (CountU-nimprovedMove), as shown in Fig 4. The performance of the algorithm depends on the number of follower bees (NF) and is very difficult to tune because of the different datasets needed to generate different settings. Here, we propose using the online parameter tuning strategy, in which the number of follower bees is defined based on an adaptive list (List) that saves the unimproved solutions of the population.

Greedy Heuristic (GH)
Population-based algorithms usually use a greedy approach to initialise the population of solutions and to improve the solutions by applying an algorithm with neighbourhood exchanges or local searches. The greedy approach used to initialise the population works as follows: For every solution in the population, the following must be conducted: 1. All the routes are taken as empty; the number of routes is equal to the maximum number of vehicles. 2. A random route that satisfies all the hard constraints must be chosen for all the customers. If all the customers are feasibly located on the routes, then the solution is saved; otherwise, the unallocated customers are inserted into a new route.

Sequential Insertion Heuristic (SIH)
Solomon [6] divides the VRP tour-building algorithms into either parallel or sequential methods. Parallel procedures are built through the concurrent construction of routes, where the number of routes is limited to a predetermined number or formed freely. Sequential procedures construct one route at a time until all the customers are scheduled. Solomon [6] derives the algorithm from five initial evaluated solution heuristics. The SIH is very effective in terms of the quality of the solution and the computational time required to find the solution [26]. The initialisation criteria for finding the initial solutions refer to the process of choosing the first customer to be inserted in the route. The most commonly used initialisation criterion is either the farthest unrouted customer, the customer with the earliest deadline, or the earliest and latest acceptable arrival time. The first customer inserted in a route is called the 'seed customer'. When the seed customer is chosen and inserted in a route, the SIH algorithm considers the insertion place for the unrouted nodes that minimises the additional distance and time necessary to include a customer in the recent and partially created route, which, in turn, determines the insertion criteria. The selection criteria in the next step attempt to maximise the advantage resulting from inserting a customer in the current partial route rather than in a new direct route.

Experimental Results
The results for each benchmark instance are represented by the distance, number of vehicles, and average distance of 31 runs, with a focus on minimising the distance. The proposed algorithm is implemented using JAVA programming language and performed on an Intel Core i3 processor. The execution times are between 30 seconds and 15 minutes.

Parameter Settings
Preliminary experiments are conducted to obtain the most appropriate values for the number of iterations, the number of recruiter (NR) and follower (NF) bees, and the final settings of the parameters (Table 1).

Neighbourhood Structures
To describe the neighbourhood structures used in this study, the following are assumed: • w, x, y, and z are customers in the same route visited in the sequence {w!x!y!z}.
• s, t, u, and v are customers in another route visited in the sequence {s!t!u!v}.
The following neighbourhood structures are used to generate a neighbour solution: • NBS1: One shift in the same route !x is removed from the path and inserted after z; the new sequence is {w!z!y!x}.
• NBS2: Two shifts in the same route !w and x are removed from the path and inserted after y and z; the new sequence is {y!z!w!x}.
• NBS3: One shift in a different route !w is removed from the first path and inserted in the other path; the new sequence of the second path is {w!s!t!u!v}.
• NBS4: Two shifts in different routes !w and x are removed from the first path and inserted in the other path; the new sequence of the second path is {w!x!s!t!u!v}. • NBS5: One swap in the same route !w and y are exchanged; the new sequence is {y!z!w!x}.
• NBS6: Two swaps in the same route !w and x are exchanged with y and z; the new sequence is {w!x!y!z}.
• NBS7: One swap in a different route !x from the first path is exchanged with t from the second; the new sequences are {w!t!y!z} and {s!x!u!v}.
• NBS8: Two swaps in different routes !x and y from the first path are exchanged with t and u from the second; the new sequences are {w!t!u!z} and {s!x!y!v}.

Comparison of BCO and Adaptive BCO
The results show that the adaptive BCO algorithm performs better than the BCO algorithm in terms of distance, number of vehicles, and average distance of 31 runs ( Table 2). The performance of the adaptive BCO algorithm is attributed to its use of the online (self-adaptive) parameter tuning strategy, which helps to improve the efficiency and robustness of the algorithm. The adaptive BCO algorithm clearly produces good results under different hard constraints in different instances.
To show the significant difference between the two algorithms, we perform the t-test and pvalue for the Solomon instances, as shown in the last column of Table 2. The results for the adaptive BCO algorithm indicate significant differences in all instances (100%), which are below the significant level (α = 0.05) in terms of distance. Furthermore, the number of vehicles is less than or equal after the modification. Fig 5 illustrates the convergence curves of the best runs of the BCO and adaptive BCO algorithms for six randomly selected instances: R1-11, R2-04, C1-09, C2-04, RC1-03, and RC2-08 (i.e., one from each category). The distance oscillates to some extent, but decreases as the number of iterations increases in both algorithms. We also observe that the results of the adaptive BCO algorithm can indicate lower distances for all instances compared with the results of the BCO algorithm. To study the consistency and reliability of the results achieved by the adaptive BCO algorithm and to obtain a sense of the solution distribution, we examine the maximum, median, minimum, lower, and upper using box and whisker plots for Solomon's 56 datasets (Fig 6). Each box plot in Fig 6 represents Fig 6 shows that the results obtained by the adaptive BCO algorithm for the 31 runs are highly consistent. All the categories of both types 1 (C1, R1, RC1) and 2 (C2, R2, RC2) indicate a small variance in the solutions. The adaptive BCO algorithm generally results in high-quality solutions, where the minimum and median are very close to each other in almost all the tested datasets, proving the robustness of the proposed algorithm. The discussion above shows that the adaptive BCO algorithm is more effective than the BCO algorithm. Therefore we use the former with SIH to attempt to further improve the results, as described in the next subsection.

Comparison of the Adaptive BCO Algorithm with the Greedy Approach Heuristic (Adaptive BCO-GH) and the Adaptive BCO Algorithm with SIH (Adaptive BCO-SIH)
The results show that the adaptive BCO-SIH algorithm performs better than the adaptive BCO-GH algorithm in terms of distance, number of vehicles, and average distance of 31 runs ( Table 3). The performance of the adaptive BCO-SIH algorithm is a result of its use of the SIH, which improves the diversification of the initial population of the adaptive BCO algorithm. The adaptive BCO-SIH algorithm clearly produces good results under different hard constraints in different instances.
To show the difference between the two algorithms, Table 3 represents the best and average distance for the Solomon instances, from which it can be seen that the adaptive BCO-SIH algorithm shows significant differences in most instances. Furthermore, the number of vehicles is also less than or equal. Fig 7 represents the distribution of solutions in the initial population of the adaptive BCO--SIH and BCO-GH algorithms for the randomly selected instances, R2-10, C1-08, C2-04, and RC1-06. The figure shows that the diversity of the solutions of the adaptive BCO-SIH algorithm is better than that of the adaptive BCO-GH algorithm. The x-axis represents the number of solutions in the population while the y-axis represents the penalty cost (distance). The quality of solutions in the initial population of the adaptive BCO-GH algorithm is represented by the triangle symbol while that of the adaptive BCO-SIH algorithm is represented by the square symbol. We also observe that the elements of the generated initial population of the adaptive BCO--SIH algorithm (as represented by the square symbol) for all the datasets are scattered far from one another, representing the distance of the quality of solutions in the population. This phenomenon is perhaps not surprising because the initial solutions are generated by the SIH as long as the hard constraint(s) is (are) respected without considering the violation of the soft constraint(s). For the greedy heuristic initialisation, the distribution of solutions (represented by the triangle symbol) is less scattered than for the SIH, indicating that the method indeed improves the quality of solutions. Table 4 compares the results of different heuristics in the literature with those of the adaptive BCO-SIH and BCO-GH based on the average distance in each category. The table shows that the adaptive BCO-SIH algorithm leads with the three new best average results in categories R1, RC1 and RC2. In addition, it is clear that the summation of the average distances for that algorithm is the best.

Comparison of the Adaptive BCO-SIH and Other State-of-the-Art Approaches
To show the significant differences between the proposed BCO-SIH algorithm and some approaches in the literature we analyse the results obtained by the proposed algorithm by conducting a statistical test, namely the Friedman's test. Table 5 summarises the rankings obtained by Friedman's test. Schulze andFahle [28] Potvin andBengio [29] Ho et al. [30] Lauet al. [ Potvin and Bengio's rank fourth, fifth and sixth, respectively. The p-value computed by the Friedman's test is 0.0407, which is below the significance interval of 95% (α = 0.05). This value shows that there is a significant difference among the observed results.
We also perform the Holm's procedure to determine whether there are significant differences between the control algorithm (the best-performing one) and the others. Table 6 shows the significance of the compared algorithms and the control algorithm [27]. The table contains the following columns: i is the ranking for the algorithm, p is the p-value, Holm is the Holm's procedure value, and in the last column, the null hypothesis is recorded as significant if the pvalue Holm's value, else it is recorded as not significant. Holm's procedure rejects those hypotheses that have a p-value Holm's procedure value.
The column null hypothesis shows the significant test, only one algorithm shows that there is significant and others show there are no significant, including our algorithm, this means the average results between these algorithms are close to the control algorithm results. However, as shown in column I, our algorithm is second ranked. Table 7 compares the best-known results of the adaptive BCO-SIH algorithm with those in the literature. The best distance is presented in bold. Distance is used as the main objective (single objective) when comparing the performance of the various algorithms.
The adaptive BCO-SIH algorithm achieves low distance routing results for 11 out of 56 datasets. This indicates that 20% obtained a lower distance than in the best-known solutions gained from various heuristics over the years. Furthermore, the results of our proposed algorithm for total distance minimisation of VRPTW are better or equal to those of the best separately available results. The adaptive BCO-SIH algorithm also yields more competitive route solutions for 25 instances with a similar number of vehicles than the best-known VRPTW solutions in the literature.

Conclusion
The adaptive BCO algorithm proposed in this study for the VRPTW problem excels in terms of both the improved quality of its solutions and shorter computational time. The results therefore show that the proposed algorithm is effective and has considerable potential for solving the VRPTW problem. The reasons for this are twofold: the use of an initialisation strategy (SIH) improves the diversification of the initial population and the online parameter tuning strategy chooses suitable parameter values based on the search progress. A comparison of the results of the proposed algorithm with those of some state-of-the-art approaches reveals that the adaptive BCO-SIH algorithm can obtain 11 new best results in Solomon's 56 VRPTW 100 customer instances, which corresponds to 20% and three out of six categories. This methodology is therefore highly suitable for the VRPTW problem. In our future work, we aim to investigate the performance of this approach and attain a balance between exploration and exploitation by enhancing the BCO-SIH algorithm through hybridisation with local search algorithms.