Hybrid Symbiotic Organisms Search Optimization Algorithm for Scheduling of Tasks on Cloud Computing Environment

Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.


Introduction
Cloud computing is one of the recent developments in the field of computing which enables limitless usage of Information Technology in diverse domains such as medicine, business, mobile system, smart systems, environmental computing etc [1,2]. This has lead to rapid adoption of cloud computing in recent years, because it acts as an efficient computing paradigm for renting Information Technology (IT) services and infrastructures based on pay-peruse model [3]. Pay-per-use model eliminates the need for companies to invest in acquisition of IT infrastructures or software licenses.
Cloud services are categorized as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) [4]. These services are provisioned to users of virtual resources which make cloud computing resources dynamic and elastic thereby creating the notion of unlimited resources. User are charged for the services they consumed on pay-per-use basis, and this flexible mode of charging users has encouraged migration of IT services to the cloud environment. The focus of this study in on IaaS cloud where computing resource are offered as services. Users subscribed for VMs for execution of their tasks, and better utilization of physical resources is directly dependent on the optimal scheduling of tasks on VMs.
Task scheduling has been one of the widely researched problems in cloud computing, but it remains a NP-hard problem [5]. Pool of Virtual resources are made available to cloud users by network of servers in IaaS layer. IaaS layer delivers hardware and associated software which enable provision of flexible and efficient computational capacities to end users. The resource management subsystem of IaaS layer is responsible for scheduling submitted tasks for execution. Scheduling of tasks on VMs is a key process on IaaS cloud, because mapping of tasks to VMs need to be carried out in an efficient manner due to heterogeneous and dynamic characteristics of VMs. Since there is no exact algorithm for finding optimal solution for NP-Complete problems, a good schedule solution can only be achieved via heuristic methods [6][7][8][9]. The objective of task scheduling algorithm is to reduce execution time and cost; the algorithm decides which VM should execute the received task. In cloud computing environment, VMs have heterogeneous processing capacities and characteristics. Therefore, load balancing among VMs needs to be taken into account when scheduling tasks, which entails careful coordination and optimization in order achieve lower makespan [10,11]. Task scheduling algorithms try to efficiently balance the load of the system taking into consideration total execution time of available VMs.
Methods proposed in the literature for solving task scheduling problems are either heuristic based or metaheuristic based. Heuristic based methods try to find optimal solution based on some predefined rules, and the quality of solutions obtained by these methods are dependent on the underlining rules and problem size. The solution obtained by heuristics search methods are not feasible and they are generated at high operating cost [12].
Metaheuristic techniques have been extensively applied to solve optimization problems. Metaheuristic methods employ a pool of candidate solutions to traverse solution space unlike the mathematical and heuristic techniques that uses single candidate solution. This attribute of metaheuristic algorithms make them perform better than mathematical and heuristic techniques. Some of the popular metaheuristic methods for solving task scheduling problems in cloud computing environment are Genetic Algorithm (GA) [13], Particle Swarm Optimization (PSO) [14], Ant Colony Optimization (ACO) [15,16], League Championship Algorithm (LCA) [11], BAT algorithm [17,18], Symbiotic Organisms Search (SOS) [10]. The idea of SOS as a metaheuristic algorithm was introduced in [19]. SOS algorithm was inspired by interactive relationship exhibited by organisms in ecosystem for survival and it was shown to perform competitively well [19] with GA, Differential Evolution (DE), PSO, Honey Bee Colony (HBC). Since the introduction of SOS algorithm, a number of researches have applied SOS to solve some practical optimization problems [10,[20][21][22][23][24][25]. Therefore, the potential of SOS in finding global solution to optimization problems exhibited so far make it attractive for further investigation and exploration.
Quality of solution and convergence speed obtained by metaheuristic algorithms can be improved by its hybridization with either a metaheuristic algorithm or local search method, by generating initial solution using heuristic search techniques or by modifying the transition operator [26]. To the best of our knowledge none of the aforementioned techniques have been explored to investigate the possible improvement of SOS in terms of convergence speed and quality of solution obtained by SOS.
In this paper, we developed a fitness function model for computing makespan taking into account utilization of VMs in order to reduce degree of imbalance among VMs. We studied task scheduling using Improved Symbiotic Organism Search(SASOS). The proposed SASOS combines SA method [27] and SOS optimization algorithm [10]. The SOS uses fewer control parameters, and has a strong exploration and faster convergence capability. SA was used to search local solution space identified by SOS which equip SASOS with exploitative ability. The objective is to obtain optimal schedules by minimizing makespan and degree of imbalance among VMs.
The main contributions of the paper are: 1. An objective function for optimum scheduling of tasks on VMs is presented taking into account the utilization level of VMs in order to reduce makespan, response time and degree of imbalance among VMs.
2. Hybridization of SOS with SA, applying SA to find optimum solution in the global solution regions identified by SOS.
3. Implementation of the proposed method in CloudSim.
4. Performance comparison of SOS and the proposed method in terms of makespan, response time and degree of imbalance.
5. Empirical analysis of convergence speed obtained by SASOS and SOS.
The organization of remaining parts of the paper is as follows. Metaheuristic algorithms applied to task scheduling problems in cloud and SOS are presented in Section 2. Section 3 describes problem formulation. Design of proposed algorithm and its description is presented in Section 4. Results of simulation and its discussion are in Section 5. Section 6 presented summary and conclusion of the paper.

Related Works
Metaheuristic algorithms [10,11,[13][14][15][16][17][18] have been applied to solve task assignment problems in order to reduce makespan and response time. These methods have proven to find optimum mapping of workloads to resources which reduces cost of computation, better quality of service, and increased utilization of computing resources. ACO, PSO, GA, and their variants are the mostly used nature inspired population based algorithms in the cloud. PSO outperforms GA and ACO in most situations [28] and has faster execution time. PSO is simpler to implement as compared to GA and ACO respectively. Workflow scheduling problems have been widely studied using PSO [14,[29][30][31] with aim of reducing communication cost and makespan. Scheduling of Independent tasks have also been studied in cloud using PSO [32][33][34] and it proved to ensure minimal makespan. Improved and hybrid versions [32,35,36] of PSO were also proposed for scheduling of tasks in cloud and they obtained better solution than those of ACO and GA. Recently, discrete version of SOS was applied to task scheduling problem in cloud computing environment [10] and SOS algorithm was found to outperform PSO and its popular variants.

Symbiotic Organisms Search
The SOS algorithm was inspired by symbiotic interactions between paired organisms in an ecosystem. Each organism denotes a potential solution to an optimization problem under consideration and has its position in the solution space. Organisms adjust their position according to mutualism, commensalism, and parasitism biological interaction models of the ecosystem. With mutualism form of interaction, the two interacting organisms benefit from the relationship and this is applied in the first phase of the algorithm. The commensalism association enables only one organism to benefit from the relationship while other is not harmed. The commensalism association is applied in the second phase of the algorithm to fine tune the solution space. With parasitism interaction, only one organism benefits while the other is harmed. The parasitism interaction technique is applied in the third phase of the algorithm. The fittest organisms survive in the solution space while the unfit ones are eliminated. The best organisms are identified as those that benefited from the three phases of the interaction. The phases of the procedure are continuously applied on the population of organisms which represents candidate solutions until the stopping criterion are reached. The basic pseudocode of SOS is presented in Algorithm 1. The quality of position of the organism depends on the fitness which defines the extent of adaptation of the organisms to the ecosystem.
SOS share some common features with most of the nature inspired algorithms. The candidate solutions are represented by population of organisms using operators to direct the search process by candidate solutions. Selection mechanism is used to keep better solutions, and it requires the settings of population size and stopping criterion before the search process starts. On the other hand, SOS does not require algorithm specific parameters unlike PSO that needs inertia weight, social and cognitive factors or GA that used crossover and mutation. Inadequate turning of these algorithm specific parameters could lead to non-optimal solutions. For (All organisms in the ecosystem) 3: Determine the best organism 4: Mutualism Phase 5: Commensalism Phase 6: Parasitism Phase 7: End For 8: While stopping condition is not exceeded The SOS algorithm procedure starts with a randomly generated population of organisms called ecosystem. Then, the positions of the organisms are updated using the three phases of SOS. In D-dimensional solution search space, a search population of n organisms is denoted as X = {X 1 , X 2 , X 3 , . . ., X n }. The position of the ith organism is denoted as X i = {x i1 , x i2 , x i3 , . . ., x id }. A fitness function is defined to determine the quality of solution obtained by an organism. Each organism represents a task schedule which is encoded in a vector of 1xn dimension, where n is the number of tasks. The elements of the vector are natural numbers in the range [0, m − 1], where m is the number of resource for executing the tasks. The best position searched by all organisms so far is represented by X best . Since task scheduling is a discrete optimization problem and SOS was originally proposed to solve a continuous optimization problem, we adapt a function proposed in [10] for mapping continuous position to a discrete position. The mapping function is defined in Eq (1). The fitness value of each organism is evaluated iteratively using their corresponding positions as input in the three phases of the algorithm as explained in the following subsections.
where m is the number of resources for executing the submitted tasks. Mutualism Phase. Suppose X i is the ith member of the ecosystem. In this phase, X j is randomly selected from the swarm of organisms to interact with X i , i 6 ¼ j for mutual benefit. The essence of the interaction is to improve extent of survival of both X i and X j in the ecosystem. The new candidate solutions for X i and X j are obtained according to Eqs (2) and (3).
where i = 1, 2, 3, . . ., ecosize; j 2 {1, 2, 3, . . ., ecosize|j 6 ¼ i}; U(0, 1) is a vector of uniformly distributed random numbers between 0 and 1. MV is the mutual relationship vector between X i and X j as defined in Eq (4). X best represents the organism with best fitness value. α and β represent the benefit factors between organism X i and X j . In a mutual relationship, an organism might benefit heavily or lightly while interacting with a mutual partner. Therefore, α and β are stochastically obtained as either 1 or 2. 1 and 2 denotes light and heavy benefits respectively. The new candidate solutions replaced the old ones if their fitness values are better than those of the old ones. In this case, X Ã i and X Ã j replace X i and X j respectively in the next generation of ecosystem. Otherwise, X Ã i and X Ã j are discarded while X i and X j survives to the next generation of the ecosystem. This scenario is captured by Eqs (5) and (6).
where f(.) denotes the fitness evaluation function. Commensalism Phase. In commensalism phase, an ith member of the ecosystem randomly selects an organism X j for interaction with X i , i 6 ¼ j. In this case, X i intends to benefit from X j , and X j neither gaining or losing from the interaction. The interaction is mathematically modelled by Eq (7).
where U(−1, 1) is a vector of uniformly distributed random numbers between −1 and 1. X best represents the organism with best fitness value similar to that of mutualism phase. X i is updated to X Ã i as computed in Eq (7), if the fitness value f ðX Ã i Þ is better that of f(X i ). The relationship for updating X i is given by Eq (8).
Parasitism Phase. In parasitism phase, an artificial parasite called parasite vector is created by cloning an ith organism X i and modifying it using randomly generated number. Then, X j is randomly selected from ecosystem, and fitness values of parasite vector and X j are computed. If the parasite vector is fitter than X j , X j is replaced by the parasite vector, otherwise X j survives to the next generation of ecosystem and parasite vector is discarded. X j is updated according to the relation in Eq (9).
PV denotes the parasite vector.

Simulated Annealing
Simulated annealing was introduced in [37,38]. It is a simple and robust optimization method inspired by the annealing process of physical systems. When a solid is heated, there will be disorderliness in the state of particles as the temperature and internal energy increases. As the cooling process is applied, the particles are ordered. The internal energy is minimized as the temperature reached the room state. The simulated annealing algorithm is obtained by using the internal energy as the objective function and temperature evolution as the control parameter [39]. The SA procedure begin with an initial solution X and created an updated solution X 0 in the neighborhood of the current solution X. The algorithm will generate a solution if the fitness value F(X Ã ) is lower than F(X). However, a higher fitness of X Ã is accepted at times with the probability define in Eq (10). This strategy enables the search procedure to avoid entrapment in local optima.
where F(X Ã ) and F(X) are the fitness evaluation functions of neighbor and current solutions respectively; and T is the temperature known as the control parameter. The algorithm perform such series of moves in order to attain equilibrium state. The temperature parameter is determined according to cooling rate. The cooling rate used in this work is adopted from [28] as defined in Eq (11). The algorithm terminates if there is no further improvement after series of decrease in temperature. The temperature mainly affects the global search performance of SA algorithm. At high initial value of temperature, the SA will have a high chance of locating global optimal solution which in consequence increase the computation time. Conversely, when the value of initial temperature is low, the chance of algorithm locating global optimal solution is limited though the computation time of the algorithm will be shorter.
where δ i is the temperature descending rate, 0 < δ < 1, i is the number of times neighbor solutions have been generated so far; T o is the initial temperature; T f is the final temperature. The basic structure of SA algorithm is presented as Algorithm 2. Generate new solution x 0 in the neighborhood of x 0 4: Compute the acceptance probability P r according to Eq (10) 5: Decide on acceptance or rejection of new solution based on P r . 6: Memorize the best solution found so far 7: Reduce the temperature 8: While stopping condition is not exceeded

Simulated Annealing based Symbiotic Organisms Search (SASOS)
With proposed SASOS, the new candidate solutions are generated by moving the previous solution towards another randomly selected solution from ecosystem using the three phases of SOS as modelled by Eqs (2) (3) and (7). The techniques increase the chance of locating the global optima, but it cannot ensure a better solution, so convergence rate is slow. The convergence will be faster, if every new candidate solution is better than the previous one. However, if only the better solutions are accepted as described in Eqs (5) (6) (8) and (9), the algorithm could be entrapped in local optima. Therefore, to improve the speed of convergence and quality of solution, SA technique is employed into solution search procedure of mutualism and commensalism phases of the SOS. The parasitism phase is left untouched because it perturbates the solution space by deleting the inactive solutions and injecting the active ones which could move the search procedure out of the local optima region. The SA can obtain better solution for the best organism of each generation of organisms, and accept poor neighbor solutions using certain probability. In each iteration, the possibility of accepting poor solutions are higher at the early stage of the search process but reduces at the later stage of the search process. The SA probability of accepting neighbor solution into new generation of organisms is given in Eq (12).
where f ðX Ã i Þ and f(X i ) are the fitness evaluation functions of neighbor and current solutions respectively, T is the control parameter. The pseudocode for SA used in SASOS is described in Algorithm 3. The steps of SASOS are described in Algorithm 4. Generate new solution X Ã i in the neighborhood of X i based on specified solution search eqaution(i.e Eqs (2) and (3) for mutualism phase; Eq (7) for commensalism phase) 3: U(0, 1) is a uniformly random generated number between 0 and 1 8: While stopping condition is not exceeded

Algorithm 4 The pseudo code for SASOS
INPUT: Set ecosize, create population of organisms X i , i = 1, 2, 3, . . ., ecosize, initialize X i , Set stopping criteria, Initialize SA parameters: Initial temperature T init , Final temperature T fin , Cooling rate δ. OUTPUT: Optimal schedule 1: Identify the best organism X best 2: While stopping criterion is not met 3: For i = 1 to ecosize 4: Mutualism Phase 5: Simulated annealing on X i according to Eq (2) using Algorithm 3 6: Simulated annealing on X j according to Eq (3) using Algorithm 3 7: Transform X i and X j using Eq (1) 8: Commensalism Phase 9: Simulated annealing on X i according to Eq (7) using Algorithm 3 10: Apply Eq (11) to reduce the temperature 11: Transform X i using Eq (1) 12: Parasitism Phase 13: Create parasite_vector 14: Update X j according to Eq (9)  15: Transform X j using Eq (1) 16: Identify the best organism X best 17: End For 18: End While

Problem Formulation
When tasks to be scheduled are received by Cloud Broker (CB), it queries Cloud Information Service (CIS) to identify the services required to execute the received tasks from the user and then schedules the tasks on the discovered services. For instance, if tasks {T 1 , T 2 , T 3 , . . ., T n } are submitted to CB in a given time interval. The processing elements (Virtual Machines) are heterogeneous having varied processing speeds and memory, indicating that a task executed on different Virtual Machines (VMs) will result in disparate execution cost. Suppose Virtual Machines {M 1 , M 2 , M 3 , . . ., M k } are available when the tasks are received by CB. The tasks are scheduled on the available VMs and execution of the tasks are done on the basis of First-Come First-Serve. Our aim is to schedule tasks on VMs in order to achieve high utilization with minimal makespan. As a result, Expected Time to Compute (ETC) of the tasks to be scheduled on each VM will be used by the proposed method to make schedule decisions. ETC values were determined using the ratio of millions instructions per second (MIPS) of a VM to length of the task [40,41]. ETC values are usually represented in matrix form, where the number of tasks to be scheduled represents the rows of matrix and number of available VMs indicates the columns of the matrix. Each row of ETC matrix represents execution times of a given task for each VM, while each column represents execution times of each task on a given VM. Since our objective is to minimize the makespan by finding the best group of tasks to be executed on VMs. Let C ij , i = 1, 2, 3, . . ., m, j = 1, 2, 3, . . ., n be the execution time of executing jth task on ith VM where m the number of VMs is and n is the number of tasks. The fitness value of each organism is determined using Eq (13), which determines the strength of the level of adaptation of the organism to the ecosystem.
makespan ¼ maxfC jk jj 2 T; j ¼ 1; 2; 3; :::; n; k 2 M; k ¼ 1; 2; 3; :::; mg ð17Þ where f(M i ) is the fitness value of virtual machine i; μ is the average utilization of virtual machines ready for execution of tasks and its essence is to support load balancing among VMs, λ i defines the utilization of virtual machine i; n is the number of tasks and m is the number VMs.

Performance metrics
The following metrics were used to evaluate the performance of the proposed method.
Makespan. Makespan is the time when the last task finished execution. Makespan is the most popular metric for measuring the performance of scheduling the methods. Lower makespan indicates efficient scheduling of task to virtual machines. Let T M i (ji) be the execution time of processing set of tasks j on virtual machine M i , makespan is defined by Eq (18). where m and n are the number of virtual machines and tasks respectively. Response Time is the time from when the job arrives in a system to the time the first task is scheduled. Response time measures the time taken for scheduling algorithm to respond to a job. Response time is determined by Eq (19).
where T firstrun is the finishing time of first task; T arrival is the arrival of the first task. Degree of Imbalance. Let T max , T min and T avg denotes the maximum, minimum, and average total execution times, respectively, among all VMs. Degree of imbalance (DI) defines the extent of load distribution among VMs according to their processing capacities, DI is determined by Eq (20).

Simulation and Results
Simulation Parameter Settings and Datasets: The performance of the proposed method was evaluated using CloudSim [42]. CloudSim is a toolkit for modeling cloud computing scenarios. Two datacenters were created each containing two hosts respectively. Each host has 20 GB ram, 1 TB storage, 10 GB/s bandwidth and time-shared VM scheduling algorithm. One of the hosts is a dual-core machine while the other host is a quad-core machine each with X86 architecture, Linux operating system, Xen virtual machine monitor (VMM), and cumulative processing power of 1000000 MIPS. 25 VMs were created each with image size of 10 GB, 0.5 GB memory, 1 GB/s bandwidth and 1 processing element. The processing power of the VMs ranges from 100 to 5000 MIPS respectively. Time-shared cloudlet scheduler and Xen VMM were used for all the VMs. Six different data sets were used to evaluate the performance of the proposed method, four of which are generated using normal, left-skewed, right-skewed, and uniform distribution presented as S1, S2, S3 and S4 Datasets respectively. Uniform distribution depicts medium size tasks, and fewer small and large size tasks. Left-skewed represents fewer small size tasks and more large size tasks while right-skewed is the opposite. Uniform distribution depicts equal number of large, medium, and small size tasks. For each distribution, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 instances were generated. The larger instances will enable us gain insight into the scalability of performance of the algorithms with large problem sizes. The parallel workloads used for evaluation are NASA Ames iPSC/860 [43] and HPC2N [44] presented as S5 and S6 Datasets respectively. NASA Ames iPSC/860 and HPC2N set log are some of the popular standard formatted workloads for evaluating the performance of distributed systems. NASA Ames iPSC/860 set log contains information of 14,794 tasks while HPC2N set log contains information of 527, 371 tasks. The simulation was run 30 times for each algorithm. The parameter settings of the algorithms are shown in Table 1.
Results: In order to exhibit the performance of SASOS against SOS, graphs of solution quality are plotted against number of iterations for the task sizes of 100, 500, and 1000 for the six data sets as shown in Figs 1-18. These graphs indicate variation in fitness values with respect to number of iterations. The plots depict the rate of convergence and speed at which the algorithms attain the optimal solution. To evaluate the algorithm that produces better solution, makespan, response time, and degree of imbalance are obtained using each algorithm. The values for these evaluation metrics are determined when applied to six different data sets as reported in Tables 2-18. The first column of each table indicates the number of tasks scheduled, the columns named "Average", "Best", and "Worst" reports the average, the best, and worst values obtained by each algorithm. The last column of each table record the improvement in average values obtained by SASOS with respect to SOS. For makespan, it can be observed that SASOS has better average makespan for all the six data sets as shown in Tables  2-7. For the response time, SASOS outperforms SOS in most cases of data sizes and data sets as shown in Tables 8-13. For degree of imbalance, SASOS produced better degree of imbalance among VMs as compared to SOS for all the task sizes and data sets as shown in Tables 14-19.
From the convergence curve of normal distributed data sets shown in Figs 1-3, SOS converges faster to a stable state than SASOS for 100 tasks. SOS converges slower than SASOS for 500 tasks, while for 1000 tasks, both SOS and SASOS converges at the same rate. The convergence curves of left normal distributed data sets, Figs 4 and 5 indicate that SOS converges faster than SASOS but SOS is only able to locate local optima. For 1000 tasks SASOS converges faster than SOS as shown in Fig 6. SASOS converge faster than SOS for 100 and 1000 task sizes for right normal distributed data set as shown in Figs 7 and 9. Both SOS and SASOS converges at    the same rate as shown in Fig 8. For uniform distributed data set, SOS algorithm converges faster than SASOS for 100, 500, and 1000 task sizes as shown in Figs 10-12. For HPC2N, SOS converges faster than SASOS for 100 and 1000 task sizes as depicted in Figs 13 and 15, while for 500 tasks SASOS converges faster than SOS as shown in Fig 11. For NASA Ames iPSC/860, SASOS converges faster than SOS for 100 tasks as shown in Fig 16, whereas SOS and SASOS converges at virtually the same point. Furthermore, the search direction of SASOS algorithm converges to stable in less than 100 iterations in all cases except in one scenario as depicted in Fig 4

Conclusion
This study presents a hybrid Symbiotic Organisms Search (SOS) algorithm named SASOS to obtain optimal schedule of tasks in cloud computing environment. The proposed algorithm employs local search search ability of Simulated Annealing (SA) in order to improve the speed of convergence and quality of solution obtained by SOS algorithm in terms of makespan, response time, and degree of imbalance. The simulation results show that SASOS outperforms SOS in terms of convergence rate and quality of solution. The proposed method can be used to solve other optimization issues in the cloud computing system and other discrete optimization problems in different domains. In the future, we intend focus on hybridizing SOS with other effective local search and metaheuristic techniques.