A deadline constrained scheduling algorithm for cloud computing system based on the driver of dynamic essential path

To solve the problem of the deadline-constrained task scheduling in the cloud computing system, this paper proposes a deadline-constrained scheduling algorithm for cloud computing based on the driver of dynamic essential path (Deadline-DDEP). According to the changes of the dynamic essential path of each task node in the scheduling process, the dynamic sub-deadline strategy is proposed. The strategy assigns different sub-deadline values to every task node to meet the constraint relations among task nodes and the user’s defined deadline. The strategy fully considers the dynamic sub-deadline affected by the dynamic essential path of task node in the scheduling process. The paper proposed the quality assessment of optimization cost strategy to solve the problem of selecting server for each task node. Based on the sub-deadline urgency and the relative execution cost in the scheduling process, the strategy selects the server that not only meets the sub-deadline but also obtains much lower execution cost. In this way, the proposed algorithm will make the task graph complete within its deadline, and minimize its total execution cost. Finally, we demonstrate the proposed algorithm via the simulation experiments using Matlab tools. The experimental results show that, the proposed algorithm produces remarkable performance improvement rate on the total execution cost that ranges between 10.3% and 30.8% under meeting the deadline constraint. In view of the experimental results, the proposed algorithm provides better-quality scheduling solution that is suitable for scientific application task execution in the cloud computing environment than IC-PCP, DCCP and CD-PCP.


Introduction
Cloud computing has been increasingly developed on the basis of internet technologies, virtualization technologies, parallel processing technologies, distributed computing and grid computing. A payment method of "pay-per-use" is used by the cloud computing providers, which makes network service on-demand, scalable hardware and software. In recent years, cloud computing has become well developed. Because its user can purchase services through leasing way, and not buy a large number of hardware and software devices. In this situation, Cloud a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and CSO(Cat Swarm Optimization) [30] (Bilgaiyan S et al.). These algorithms own the higher time complexity and very higher time consuming, so they do not apply to the real cloud computing system sparingly.
Recently, many effective and feasible metaheuristic solutions are proposed. The main concept of metaheuristic solution that the reasonable scheduling order list of task nodes is acquired according to the property analysis of task graph, under the special constraints condition, such as deadline, budget etc., and map task node to the corresponding server. The classical heuristic solution includes IC-PCP&IC-PCPD2 (Abrishami S et al.) [31] (IaaS Cloud Partical Critical Paths)& (IaaS Cloud Partial Critical Paths with Deadline Distribution), DCCP [32](Vahid A et al.) (Deadline Constrained Critical Path), Deadline-MDP(Deadline-Markov Decision Process) [33] (Jia Y et al.), CD-PCP [34](Abrishami S et al.)(Cost-Driven Partial Critical Paths)etc., but these algorithms only consider task graph and server itself, which sort all task nodes and select the execution server prior to the actual scheduling. The above scheduling algorithms do not consider the change problem of sub-deadline and execution cost in the scheduling process. They do not consider the actual computation time (cost) on the execution server in the scheduling process.
The main contributions of this paper and a simple comparative analysis with reference [34] are summarized as follows: Reference [35] proposed a scheduling algorithm for cloud computing based on the driver of dynamic essential path, i.e., DDEP algorithm. This paper proposes a deadline-constrained task scheduling algorithm based on the analysis of the dynamic essential path from our previous work [35], i.e. Deadline-DDEP algorithm. The final objective is different between DDEP algorithm and Deadline-DDEP algorithm. The DDEP algorithm is to shorten the Makespan of task graph in the cloud computing. This paper proposes Deadline-DDEP algorithm to reduce the total execution cost while meeting the user's deadline constraint. Our previous work (DDEP algorithm) uses the different priority values and the dynamic essential path values to confirm the scheduling order of all the task nodes. This paper proposes the dynamic sub-deadline strategy to compute the sub-deadline values for every task node based on our previous work. The strategy fully considers the dynamic sub-deadline affected by the dynamic essential path in the scheduling process. To the problem of selecting server for each task node, our previous work [34] uses the server that owns the earlier finish time to schedule task node. This paper propose the quality assessment of optimization cost strategy to solve the selective problem of scheduling server for all task nodes, the strategy selects the server that not only meets the sub-deadline but also owns the much lower execution cost. The experimental results show that, the proposed Deadline-DDEP produced remarkable performance improvement rate on the total execution cost while meeting the user's deadline constraint.

Related work
The heuristic algorithms for the deadline-constrained clouding computing scheduling problem have a common feature, which is the sub-deadline and scheduling result are done prior to the task actual scheduling. On the contrary, the proposed algorithm dynamically update the sub-deadline of task nodes in the actual scheduling process. The scheduling result is obtained when the task graph is fully completed. A simple comparative analysis of the proposed algorithm and the existing scheduling algorithms is as the following sections.
(1) IC-PCP IC-PCP (IaaS Cloud Partial Critical Paths) [31] computes EST(Earliest Start Time), EFT(Earliest Finish Time) and LFT(Latest Finish Time) for all task nodes, and then the task nodes are got in the PCPs(Partial Critical Paths). Firstly, schedule the unassigned task nodes without parent task nodes in the PCPs. If the current task node is finished before its Latest Finish Time, schedule it on the current cheapest server. Update the EST, EFT and LFT of all unassigned successor task nodes when the current task node is finished. The algorithm stops until there is no unassigned parent or child task node. The algorithm is simple and viable, and its time complexity is O(n 2 ), where the number of task nodes is n.
Compute the EFT and LFT of the current task node by itself property and the minimum execution time of its successor task node by IC-PCP algorithm. The algorithm does not consider the EFT and LFT of the current task node. The actual execution time and communication time of its successor task nodes are affected the EFT and LFT of the current task node in the scheduling process. Compared with IC-PCP algorithm, the proposed algorithm dynamically update the sub-deadline by the deadline of task graph and the dynamic essential path of task node in the scheduling process. In this way, the time range of selecting the optimal server will be broaden. For the sort order of task nodes is obtained in the scheduling process by the proposed algorithm (Deadline-DDEP), which makes the sort order generated by Deadline-DDEP algorithm is more reasonable than IC-PCP algorithm.
(2) DCCP DCCP(Deadline Constrained Critical Paths) [32]algorithm is to first partition task graph into different levels based on their respective parallel and synchronization requirements. Compute the earliest finish time of all task nodes according to the average communication time and the minimum execution time. To the same level task nodes, their sub-deadline is equal to the maximum value of their earliest finish time. Obtain the CCPs (Constrained Critical Paths) task nodes according to their average execution time and communication time. All task nodes in a CCPs are executed on the same server that the cheapest server among servers and meet their sub-deadline. DCCP algorithm time complexity is O(n 2 � k), where the number of task nodes is n; and k the number of server types. DCCP algorithm selects all task nodes in a CCPs are executed in the server with the goal of avoiding communication time between task nodes, in this way, the choice of selecting cheaper server for a single task node is reduced, which may add to the total execution cost. Compared with the DCCP algorithm, the proposed algorithm uses the dynamic sub-deadline for each task node. It not only meets the deadline of task graph, but also adds to the choice of selecting cheaper server for each task node, and then minimizes the total execution cost.

(3) Deadline-MDP
The main concept of Deadline-MDP (Deadline-Markov Decision Process) [33] algorithm divides task graph into many independent branches and synchronization tasks. Divide The overall deadline into sub-deadline for branches task according to their minimum processing time. The optimal decision is to minimize the execution cost of each branches task within the assigned sub-deadline. Because all parallel branches tasks own the same sub-deadline, to the multi-task-nodes and the longer execution path of branches tasks, which is executed on the faster and expensive server to meeting its sub-deadline, in this manner, the total execution cost may be increase. Compared with the Deadline-MDP, the proposed algorithm uses the dynamic sub-deadline according to the actual execution time and communication time, which adds to the choice that the optimal server.

(4) CD-PCP
CD-PCP(Cost-driven Partial Critical Paths) [34] algorithm searches for the partial critical paths(PCP) according to the minimum execution time and minimum communication time. The task nodes in the PCP are scheduled within the user's deadline firstly, the execution cost is minimized. The start time of task nodes in the PCP depends on the unscheduled parent task node. The unscheduled parent task node is executed on the better server while meeting its subdeadline. This procedure continues recursively until all task nodes are scheduled successfully. Compared with the proposed algorithm uses the dynamic sub-deadline, the CD-PCP algorithm shorten the sub-deadline of unscheduled parent task nodes, which adds to the execution cost of unscheduled parent task nodes. Furthermore, the total execution cost may be increase.

Data model
The cloud computing system is a computer network composed of user, network and an easily extensible scheduling algorithm. The cloud providers offer the cloud computing resources and services to cloud users via the different scheduling algorithm. The target of cloud computing scheduling algorithm is how to map task to the corresponding server under meeting the user's different QoS. Fig 1 shows the task-scheduling model in the cloud computing system.
The target of cloud computing scheduling algorithm is how to map task to the corresponding server under meeting the user's different QoS. We first create the scheduling model by converting the cloud computing scheduling problem into the DAG scheduling problem [35]. The DAG graph is expressed: G = {Q, E, S}, where Q is the task node set of DAG graph, Q = {Q i , Q 2 , . . ., Q n }, Q i represents the ith task node, n represents the number of task nodes; E is the set of communication costs among task nodes, E = {e ij }(i, j 2 Q), and e ij represents the precedence constraint relations such that Q i should complete its execution before Q j begins. S is the set of network servers, S = {S 1 , S 2 , . . ., S m }, S m represents the mth server, m represents the number of servers, that is the processing machine of task node. c is the execution cost set of The deadline-constrained DAG scheduling problem is described as follows: D represents the user's deadline, EST(Q i , S m ) represents the earliest start time for Q i on the S m ; and EFT(Q i , S m ) represents the earliest finish time of Q i on the S m . For the single entry task node Q i on the S m : where T 0 represents the application start time. For the other task nodes in the DAG graph: where Pre(j)is the set of immediate predecessor task nodes of Q j . After all immediate predecessor task nodes of Q j are finished, the data are transmitted to Q j ; where e ij represents the communication cost between Q i and Q j . When all data required for Q j have arrived, the server S m begins to process Q j .
The objective functions of all task nodes on the DAG graph are described as: Cost ¼ where Q exit is a single exit task node, t ik is the actual execution time of Q i on the S k . The final objective is to minimize the total execution of task graph while meeting the user's deadline, i.e., min(Cost) and Makespan � D.

Scheduling algorithm
The goal of scheduling algorithm is to minimize the execution cost of task graph while meeting the user-defined deadline. Whether the task graph will be finished within the user's deadline depends on whether each task node will be finished in its sub-deadline. The dynamic essential path of task node is changeable constantly along with the actual execution time and communication time of its predecessor task node. The paper proposes the dynamical sub-deadline strategy based on the dynamic essential path changes of task node. The strategy fully considers the sub-deadline of task node affected by its dynamic essential path in the scheduling process. Under meeting the dynamic sub-deadline of task node, the quality assessment of optimization cost strategy is proposed. The strategy selects the relatively cheaper server to schedule each task node. Finally, the final objective of minimizing the total execution can achieve.

Dynamic sub-deadline strategy
To explicitly describe the scheduling algorithm, we define the following terminology: Dynamic essential path. Firstly, compute the path of task node based on the actual execution time of task node and the communication time with their predecessor task node. Because the path of task node will be changeable in the scheduling process, it is called as dynamic essential path(DEP).
In the cloud computing system, to the scheduling problem of a deadline-constrained DAG graph. For all task nodes, their dynamic essential path are got based on the actual execution time and the communication time with their predecessor task node. To the pre-scheduling task nodes, for their execution time and communication time with their predecessor task node are uncertainty, their dynamic essential paths will be changeable. The sub-deadline of taks node is associated with its dynamic essential path, so the sub-deadline is changeable. For the changeable sub-deadline, the dynamic sub-deadline strategy is proposed. The strategy will update the sub-deadline and sort order of the pre-scheduling task nodes according to their dynamic essential path. The concrete steps are as follows: Step1. Initialize the dynamic essential path value for all task nodes. The path length values for all task nodes are obtained by the formula (7).
Where t j is the average execution time of Q j , and Pre(Q j ) is a set of the predecessor task nodes of Q j .
Step2. Search for the pre-scheduling task nodes. The entry task nodes have no predecessor task node in the DAG graph, compared with the other task nodes in the DAG graph, the entry task nodes are first pre-scheduling task nodes. The dynamic sub-deadline values of all entry task nodes are got firstly. The corresponding formula as follows: Where Q entrj is an entry task node, Q exit is an exit task node. Sort all entry task nodes in descending order by their dynamic essential path values. Firstly scheduled the entry task node that has longest dynamic essential path. Because its finish time influences indirectly the Makespan. Select the optimal servers for all entry task nodes by the quality assessment of optimization cost strategy in the above order list. Then update the dynamic essential path, execution time and execution server for all entry task nodes. The corresponding formula as follows: Where t entryk is the execution time of Q entrj on the S k . When all entry task nodes have been finished, its successor task nodes are pre-scheduling task nodes. Update the dynamic essential path value of all pre-scheduling task nodes by formula (7). Compute the dynamic sub-deadline value of all pre-scheduling task nodes by the formula (10).
Where AllCurPre is the set of the pre-scheduling task nodes. Sort all pre-scheduling task nodes by their dynamic essential path value in descending order. Use the quality assessment of optimization cost strategy to schedule all pre-scheduling task nodes by their sort order and sub-deadline values. To accurately compute the dynamic essential path, the communication time (cost) is reduced to 0, i.e., e ij = 0, when the two task nodes are scheduled on the same server, and Q i is a predecessor task node of Q j . When all pre-scheduling task nodes have been finished, update their computation cost, processing servers and dynamic essential path values. The formula is as follow: Step3. Schedule all exit task nodes. Define the dynamic sub-deadline value of all exit task nodes as D by the dynamic sub-deadline strategy. Update the dynamic essential path value of all exit task nodes by formula (7). Sort all exit task nodes by their dynamic essential path value in descending order. Use the quality assessment of optimization cost strategy schedule all exit task nodes by their sort order and dynamic sub-deadline values.

Quality assessment of optimization cost strategy
Under meeting the dynamic sub-deadline value, this paper proposes the quality assessment of optimization cost strategy to solve the selective problem of scheduling server for all task nodes. The strategy considers a broader view of the total execution cost. The strategy selects the optimal server for each task node according to their sub-deadline, execution cost and finish time on each server, which makes the current task node and its successor task nodes to have the lower execution cost. The concrete steps as follows: Q curr represents the current task node. SD(Q curr ) is defined as the dynamic sub-deadline of Q curr . FT Max (Q curr ) and FT Min (Q curr ) represents the maximum finish time and minimum finish time of Q curr on all servers. Cost cheapest (Q curr ) represents the cheapest execution cost of Q curr on all servers. Cost Max (Q curr ) and Cost Min (Q curr ) represents the maximum execution cost and minimum execution cost of Q curr on all servers. The time quality and cost quality of Q curr on all servers is as follows: CQðQ cur r ; S j Þ ¼ CostðQ cur r ; ; S j Þ À Cost cheapest ðQ cur r Þ Cost M ax ðQ cur r ; S j Þ À Cost M in ðQ cur r ; S j Þ ð13Þ Where TQ(Q curr , S j ) measures how much closer to the dynamic sub-deadline and the finish time of Q curr on the S j , i.e., measures the finish time urgency of Q curr . When the TQ(Q curr , S j ) value is negative number, it means Q curr is not finished within its dynamic sub-deadline on the S j , then Q curr rejects to be scheduled on the S j . When TQ(Q curr , S j ) is a bigger positive number, it means that the finish time of Q curr is farther its dynamic sub-deadline on the S j . When TQ (Q curr , S j ) is a smaller positive number, the finish time of Q curr is closer to its dynamic subdeadline on the S j . CQ(Q curr , S j ) measures how much less the execution cost of Q curr on the S j than the cheapest execution cost on all servers, which is used to avoid selecting the server that has worse performance and higher execution cost. QM(Q curr , S j ) is defined to select the better reasonable server for Q curr , which is used to select the server that has not only lower execution cost, but also meets its dynamic sub-deadline. When the QM(Q curr , S j ) is bigger value, it means the finish time of Q curr on the S j is farther than its dynamic sub-deadline. Its execution cost on the S j is closer to the cheapest execution cost, in contrast, it means the finish time of Q curr on the S j is farther than its dynamic sub-deadline, and its execution cost on the S j is larger than the cheapest execution cost. QM(Q curr , S j ) formula is as follows: QMðQ cur r ; S j Þ ¼ TQðQ cur r ; S j Þ þ CQðQ cur r ; S j Þ ð14Þ The quality assessment of optimization cost strategy selects the server that has smaller QM (Q curr , S j ) value to schedule Q curr .
To the reader understand the proposed scheduling algorithm clearly, we draw the flowchart of the proposed algorithm. Shown in Fig 2.

An illustrative example
This paper converts a workflow into the DAG graph shown in Fig 3. The computation time on the three different types (heterogeneous) server are also given in the Table 1. It is assumed that three types server (S1, S2, S3) are used to schedule the DAG graph, and all servers are connected with communication links of the same capacity. There are many same type servers. Thus, the communication time between task nodes is determined by the edge of the DAG graph shown in Fig 3. The time interval of the computation server is assumed to be 10. The unit price of S1, S2, S3 is 5, 2, 1 respectively. The Deadline of a workflow in Fig 3 is 40 unit time. We demonstrate the implementation process of Deadline-DDEP algorithm.
This paper converts a workflow into the DAG graph shown in Fig 3. The computation time on the three different types (heterogeneous) server are also given in Table 1. It is assumed that three type servers (S1, S2, S3) are used to schedule the DAG graph, and all servers are connected with communication links of the same capacity. There are many same type servers. The communication time between task nodes is denoted by the edge of the DAG graph shown in Fig 3. The unit price of S1, S2, S3 is 5, 2, 1 respectively. The Deadline of a workflow in Fig 3 is 40 unit time. We demonstrate the implementation process of Deadline-DDEP algorithm.
DDEP algorithm is a scheduling algorithm for a deadline-constrained workflow in the cloud computing system and contains four major data phases: (1)The computation time phase, (2) the communication time phase, (3) the dynamic essential path phase, and (4)the pre-scheduling task node phase.

Computation time phase and communication time phase
The two phases are an original array shown in Fig 3. The workflow owns the phase or table that stores the computation time of each task node on the different servers. The communication time between task nodes is stored by the adjacent matrix.
2. The dynamic essential path and the pre-scheduling task node phase The dynamic essential path phase stores the dynamic essential path for all task nodes. The QM value, execution time and scheduling server of all pre-scheduling task nodes are stored in the pre-scheduling task nodes phase. The implementation process of task graph in Fig 3  as follow: Step1. Initialize the dynamic essential path for all task nodes. Compute the dynamic essential path values for all task nodes by the formula (7). As shown in Table 2.
Step2. Chedule all entry task nodes. Compared with the other task nodes, all entry task nodes first become the pre-scheduling task nodes. Q1, Q2 and Q3 will turn to the prescheduling task node firstly. According to the dynamic essential path value of Q1, Q2 and Q3 in the Step1, sort Q1, Q2 and Q3 in descending order: Q2, Q3 and Q1. Compute the dynamic sub-deadline and QM value for Q1, Q2 and Q3 by formula (8), - (14). The related values show in the Table 1. If the finish time of Q i is greater than its dynamic sub-deadline on the S j , the QM(Q i , S j ) value is set to infinity by the Deadline-DDEP algorithm. Select the server that has the smaller QM value to schedule each entry task node. The corresponding value shows in the Table 2.
Step3. Update the dynamic essential path for each task node. When Q1, Q2 and Q3 have been finished, their execution time and execution server are updated shown in the  Table 2. Update the dynamic essential path of Q1, Q2 and Q3 to 2, 5 and 3 by formal (9). Update the dynamic essential path of other unscheduled task nodes by formula (7).
Step4. Update the pre-scheduling task nodes phase. After all entry task nodes are finished, their successor task nodes become the pre-scheduling task node. When Q1, Q2 and Q3 have been finished, Q4, Q5 and Q6 turn to be pre-scheduling task nodes. According to the dynamic essential path value of Q4, Q5 and Q6 in the Step3, sort Q4, Q5 and Q6 in descending order: Q5, Q6 and Q4. Compute the dynamic sub-deadline and   QM of Q4, Q5 and Q6 by formula (10), (12)- (14). Select the server that has smaller QM to schedule each pre-scheduling task node. The corresponding values show in the Tables 2 and 3.
Step5. Update the dynamic essential path for each task node. When Q4, Q5 and Q6 have been finished, their execution time and execution server are updated to the values shown in the Tables 2 and 3. The dynamic essential path of Q4, Q5 and Q6 is updated to 7, 15, 13 by formula (11). Update the dynamic essential path of other unscheduled task nodes by formula (7).
Step6. Scheduling all exit task nodes. The dynamic sub-deadline of all exit task nodes is 40.
Compute the QM values of all exit task nodes by formula (14). Select the server that has smaller QM value to schedule each pre-scheduling task node. The corresponding values show in the Tables 2 and 3. The total execution cost of task graph in the Fig 3 shows in the Table 3. Table 2 shows the values parameter for each step of running proposed algorithm in the rows. The states of task node are "Pre-scheduling", "Finished". "Pre-scheduling" is that the predecessor task nodes of the current task node are finished, the current task node is a schedulable task node. "Finished" is that the current task node has been executed. If the QM value of server is infinite, the task node is not executed on the server. Table 3 shows the "Start time", "End time" and "Total cost" of every server.

Complexity analysis
Time complexity is the amount of computation required to execute the algorithm. The time complexity of Deadline-DDEP algorithm contains two separate components: one is the time complexity of the sub-deadline strategy, and the other is the time complexity of the quality assessment of optimization cost strategy. It is assumed that k is the number of task nodes, and n is the number of server types. The specific time complexity analysis is as follows.
1. The time complexity of the dynamic sub-deadline strategy contains three separate components. The adjacent matrix is used to store the relationships (communication time) between task nodes in the task graph. The number of task nodes is n, and the size of adjacent matrix is n � n. First part is the number of searching for the pre-scheduling task node is n. Second part is the number of computing the dynamic essential path of all task nodes. Because the maximum number of the predecessor task nodes of the current task node is n, the maximum number of computing the dynamic essential path of the current task node is n; the maximum number of computing the dynamic essential path of all task nodes is n � n. Third part is the number of computing the dynamic sub-deadline for all task nodes, whose maximum number is n. The maximum number of computing the sub-deadline of all task nodes is n + n � n + n = n 2 + 2 � n, the time complexity of the dynamic sub-deadline strategy is O(n 2 ). 2. The time complexity of the quality assessment of optimization cost strategy contains two separate components. First part is the time complexity of sorting all task nodes by the dynamic essential path. Sort all task nodes in descending order by their dynamic essential path, whose time complexity is O(n log n). Second part is the scheduling server of all task nodes are got according to their sort order and QM values, the maximum number of computing QM values for all task nodes is k � n, where k is the number of server types, n is far greater than k, its time complexity is O(k � n) = O(n). The time complexity of the quality assessment of optimization cost strategy is O(n log n) + O(n).
To summarize, the time complexity of the proposed algorithm is O(n 2 ) + O(n log n) + O(n), approximated as O(n 2 ).

Experiment result and comparison
In this section, we present simulation experiments on the Deadline-DDEP algorithm. The paper uses the different types sample task graphs to evaluate the performance of proposed algorithm. There are two ways to choose the sample task graph. One is using a random DAG generator to create the different structure task graph, other is using a library of realistic task graph to obtain the different type task graph. Although the latter seems to be a better choice, unfortunately, there is no such a comprehensive library available to researchers. We designed a random generator to ensure the accuracy of the simulation experiments, and used IC-PCP, DCCP and CD-PCP algorithms in benchmark experiments to obtain a relatively objective evaluation. The experimental model is a rather typical computing model-DAG scheduling model. The simulation experiments are as follows. First of all, the experiment environment is introduced. Secondly, the experimental parameters are presented. Thirdly, the performance results are covered.

Experimental environment
Experimental platform is Win8 64 bit, Matlab2012, CPU: intel i5, Memory:8G. The generator depended on several input parameters according to user requirements. The corresponding input parameters are listed in Table 4.
The following experiment results were acquired, as generated with scheduling of the randomly generated DAG graph using IC-PCP, DCCP and CD-PCP algorithms.

Experiment parameters
The parameters about the deadline and cost are by definition in our experiment to evaluate the performance of the proposed algorithm. They are associated with the scheduling result of task graph. The deadline parameter of task graph is D, to specifically define D, we first define the [MinComuniTime, MaxComuniTime] The comunication time range of task nodes.
[0, MaxInDegree] The in-degree range of task node.
α d The deadline factor of task graph is [0, 1] The unit price of each server, is got by formula (19) https://doi.org/10.1371/journal.pone.0213234.t004 following parameters: CPFT max and CPFT min , which represents the maximum and minimum finish time of all task nodes in the critical path. The corresponding formula as follows: Where is the set of all task nodes in the critical path, CriPre(Q i ) is the set of all predecessor task nodes of Q i in the critical path, t min (Q i ) and t max (Q i ) represent the maximum and minimum execution time of Q i on all servers, i.e., the fastest and lowest execution time. Because the finish time of all task nodes in the critical path indirectly influence the completion time of task graph, so the deadline of task graph is defined according to the CPFT max and CPFT min parameter. The corresponding formula as follows: The total execution cost of task graph is associated to the execution cost of each task node. The execution cost of task node is associated to the execution time and the unit price of server, so the unit price of S k , S k 2 S, that is used to the experiment is defined as follows: where β Sk represents the ratio of the CPU processing capacity to that of the fastest server of S k . The unit price of all servers will be in the range of [0, 1]. The unit price of the fastest server is 1.

Performance metrics analysis
This section shows the scheduling result analysis of the different structure DAG graph, Bharathi et al. [36] proposes the structure of five realistic task graph: Montage, CyberShake, Epigenomics, LIGO and SIPHT, shown in Fig 4. To evaluate the performance of the proposed algorithm, we adopt the common performance comparison metrics NC(NormalizedCost) and PSR (PlanningSuccessfulRate). NC is the main performance measure for a scheduling algorithm on a graph and is the ratio of the total execution cost to the cheapest execution cost of task graph with a formula defined by: where C cheapest is the execution cost that all task nodes are executed on the cheapest server. If the NC value is smaller, the algorithm performance is better; if the algorithm performance is worse, the NC value is larger. The average NC values over several DAG graphs are used to our experiment.
PSR is the ratio of the successful scheduling number of task graph to the total number of the experimental task graph. The PSR formula is by definition: Where SuccesfulPlanningNumber is the successful scheduling number of task graph under meeting the defined deadline. If the SPR value is smaller, the algorithm performance is worse, whereas if the SPR value is larger, the algorithm performance is better. The average SPR values over several DAG graphs are used to our experiment. 1) Experimental analysis of the task graph structure. The goal is to verify the influence of the task graph structure on the scheduling algorithm by the NC and PSR. To show the performance of the proposed algorithm, we adopt different structure, different deadline-constrained and same size of DAG graph that are scheduled on the same-size type server to obtain the experimental result. We set the size of task graph to 100, and the number of server type to 5. The computation time is generated randomly in the [5,10]. The communication time is generated randomly in the [5,10]. The out-degree and in-degree of task graph are also randomly generated in the [1,10]. The deadline factor of task graph is set to {0.2, 0.4, 0.6, 0.8, 1.0}. Figs 5-9 shows the obtained comparative results for average NC and average SPR of Montage, CyberShake, Epigenomics, LIGO and SIPHT by the different algorithm, as averaged over 100 runs for the same-deadline-factor task graph. According to the contrast analysis of the experimental result in Figs 5-9, the average NC of Deadline-DDEP algorithm is better than those of IC-PCP algorithm, DCCP algorithm and CD-PCP algorithm by 10.3%, 18.3% and 30.8%, respectively. Fig 6 shows that the value of average NC by the Deadline-DDEP algorithm is higher than by IC-PCP algorithm, but is lower than by DCCP and IC-PCP algorithm. It is because the Deadline-DDEP algorithm selects the faster CPU and higher price of server to schedule the task graph that has the same dynamic sub-deadline and the multi-parallel task nodes. The average PSR of Figs 5-9 show all experimental task graphs are successfully finished in the defined deadline by the Deadline-DDEP and DCCP algorithm, but the IC-PCP and CD-PCP have higher failure rate. This is because the Deadline-DDEP algorithm gets the dynamic sub-deadline for every task node by its dynamic essential path and the deadline of task graph.
2) Experimental analysis of the task graph scale. The goal is to verify the influence of the different size and different deadline-constrained task graph on the scheduling algorithm by the NC and PSR. We adopt the different size, different deadline-constrained and different structure of DAG graph and schedule them on same type of servers to obtain the experiment result. The size of task graph is small, medium and larger, which has the number of task node as 20,100,500 respectively. The number of server types is 5. The computation time is randomly generated from the interval [5,20], and the communication time is randomly generated from the interval [5,20]. The out-degree and in-degree of task graph are also randomly generated from the interval [1,10]. The deadline factor of task graph is set to 0.2, 0.4, 0.6, 0.8, 1.0. Figs 10-12 shows the obtained comparative results for the average NC, as averaged over 50 runs on the same-type servers. According to the contrast analysis of the experimental results in Figs 10 and 11, the average NC of Montage and Epigenomics structure task graph is better than the other structure task graphs by Deadline-DDEP algorithm. For the same size and same deadline of task graph, the CyberShake, LIGO and SIPHT structure task graph have more parallel task nodes; the Deadline-DDEP algorithm will select the CPU faster and price higher of server to schedule the multi-parallel task nodes while meeting their dynamic sub-deadline. From see the contrast analysis of average NC for the large scale task graph in the Fig 12, the average NC of CyberShake, LIGO and SIPHT structure task graph is better than the Montage and Epigenomics structure task graph by Deadline-DDEP algorithm.
The CyberShake, LIGO and SIPHT structure task graph that has the same size and same deadline of task graph, have longer dynamic essential path for entry task node. Every task node will own a tight dynamic sub-deadline by the proposed algorithm. The proposed algorithm will select the CPU faster and price higher of server to schedule every task node while meeting their dynamic deadline, in this way, it makes the total execution cost will be higher. 3) Conclusion. The performance of the proposed algorithm is verified from two aspects. According to the analysis results shown in Figs 5-12, the proposed algorithm exhibits better performance than IC-PCP algorithm, DCCP algorithm and CD-PCP algorithm. Because the proposed algorithm fully considers the total execution cost affected by the dynamic sub-deadline and execution cost of each task node, it makes the sub-deadline and execution cost of all task nodes more reasonable, which can shorten the total execution cost while meeting the user's defined deadline. The simulation result show that the proposed algorithm has a good performance. According to the analysis results shown in Figs 5-12, the proposed algorithm exhibits better performance than IC-PCP algorithm, DCCP algorithm and CD-PCP algorithm.  Because the proposed algorithm fully considers the total execution cost affected by the dynamic sub-deadline and execution cost of each task node, it makes the sub-deadline and execution cost of all task nodes more reasonable, which can shorten the total execution cost while meeting the user's defined deadline.

Conclusion
In this paper, we propose a deadline-constrained scheduling algorithm for the cloud computing system based on the driver of dynamic essential path to solve the deadline-constrained task scheduling problem. Because the scheduling model is a DAG model of parallel computing, the algorithm has universality. The innovative points and significance of this paper are as follows. The algorithm adopts the dynamic sub-deadline strategy to solve the problem of the dynamic sub-deadline affected by the change of the dynamic essential path of each task node in the scheduling process. Compared with the existing scheduling algorithm, the dynamic sub-deadline is more reasonable using the proposed strategy, which adds to the planning successful rating. The algorithm uses the quality assessment of optimization cost strategy to solve the selective problem of scheduling server for each task node. The strategy chooses the optimal server that has the lower time and cost quality values by the sub-deadline urgency and the relative execution cost in the scheduling process. The optimal server for each task node can shorten the total execution cost while meeting the user's defined deadline. The time complexity of the proposed algorithm is O(n 2 ), which is lower than those of the traditional deadlineconstrained cloud scheduling algorithms. As a result, the proposed method is simple and viable. Compared with the other deadline-constrained scheduling algorithms, the performance of the proposed algorithm is much better.
In conclusion, the proposed algorithm is able to solve the cloud computing scheduling problem, and offer a certain reference value for solving the scheduling problem of parallel computing, distributed computation and grid computing. Our future work will use multiobjective heuristic algorithm to solve the communication-change application scheduling problem on the Cloud computing and will take into account the load balance. Supporting information S1 File. The minimal data set. (RAR)