Let A be any fixed cut-off restart algorithm running in parallel on multiple processors. If the algorithm is only allowed to run for up to time D, then it is no longer guaranteed that a result can be found. In this case, the probability of finding a solution within the time D becomes a measure for the quality of the algorithm. In this paper we address this issue and provide upper and lower bounds for the probability of A finding a solution before a deadline passes under varying assumptions. We also show that the optimal restart times for a fixed cut-off algorithm running in parallel is identical for the optimal restart times for the algorithm running on a single processor. Finally, we conclude that the odds of finding a solution scale superlinearly in the number of processors.
Citation: Lorenz J-H (2016) Completion Probabilities and Parallel Restart Strategies under an Imposed Deadline. PLoS ONE 11(10): e0164605. doi:10.1371/journal.pone.0164605
Editor: Yongtang Shi, Nankai University, CHINA
Received: April 15, 2016; Accepted: September 28, 2016; Published: October 12, 2016
Copyright: © 2016 Jan-Hendrik Lorenz. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The author received no specific funding for this work.
Competing interests: The author has declared that no competing interests exist.
Restart strategies are commonly used in probabilistic algorithms. If the current computation takes too long, then the algorithm is started over with a different random seed. Deciding when to restart is an important task in designing an algorithm and several strategies are known. Luby et al. introduced the fixed cut-off strategy (t, t, …) in . This means after t steps the algorithm is restarted. They also showed that this strategy is optimal for a certain value of t. The expected runtime is , where is a random variable describing the runtime of a fixed cut-off strategy using restart time t and F is the cumulative runtime distribution. However, for deciding the value of t, the exact runtime distribution has to be known. In most real applications this is difficult to achieve. Luby et al. also introduced Luby’s universal strategy in , which is a strategy that does not require any knowledge about the runtime distribution. And they showed that, compared to the optimal strategy, Luby’s universal strategy is only worse by a logarithmic factor.
One interesting question in the context of restart strategies is whether restarts are helpful at all, and if so, under which conditions. This question has been answered by van Moorsel and Wolter in : They showed that restarts at time t are useful if E[A] < E[A − t ∣ A > t] holds where A is a random variable describing the runtime of the respective algorithm. This behaviour is shown by heavy-tailed distributions. However, in general such a condition cannot always be analysed since the expected runtime may be infinite.
Restart strategies are a way to acquire a solution for a hard problem in (on average) less time. Another possibility is to increase the available computational power. Since some years ago even personal computer usually possess more than one processor which makes parallel processing a valid option. Of course, restart algorithms can also be operated on several processors at once. Luby and Ertel noted in  that a fixed cut-off strategy is not necessarily optimal in the case of parallel computing. And in 2011 Shylo et al. showed in  that the optimal restart value (regarding expected runtime) for a single process fixed cut-off strategy does not necessarily coincide with the optimal value in the parallel case. Under certain conditions they showed that the optimal restart time in the parallel case is greater than the restart time in the single process case. They also showed that the speedup-ratio is sublinear. While in 2014 Cire et al. showed in  that the speedup-ratio of Luby’s universal strategy scales asymptotically linearly with the number of processors.
Closely related to the idea of restarts is a concept called portfolio. The basic idea of a portfolio is to run different algorithms solving the same problem. They can either run in parallel on multiple processors or share a single processor. An introduction to portfolio theory can be found in . Shylo at al. examined the relationship between restart strategies and portfolios in . They found that the speedup ratio of a restarted algorithm running on n processors and two restarted algorithms running on (in total) n processors is bounded by a small constant.
In several real-life scenarios algorithms are not allowed to run for an arbitrarily long time, instead the run is aborted after a fixed time D. This is commonly denoted as a deadline. Under these circumstances it cannot be guaranteed that a solution will be found. Therefore the probability of finding a solution within the deadline is, in some cases, a more interesting issue than the expected runtime (which might not even exist). Van Moorsel and Wolter showed in  that a local extremum regarding probability can be found at the so-called equi-hazard intervals. This means, if t1, t2 … tk are the restart times, then they are a local extremum iff Here F is the runtime distribution and f is its derivate, the density function.
Our contribution: We consider restarted algorithms running in parallel on several processors. The used restarted algorithms are using the fixed cut-off strategy. We are imposing a deadline on such algorithms and analyse the probability of finding a solution within the deadline (Theorem 3.2). We then compare the probabilities of finding a solution on a single processor with finding a solution on multiple processors. We provide upper and lower bounds on the probabilities of multiple processors finding a solution under varying assumptions regarding both the restart times and the deadline. The respective bounds are given in Theorem 3.4 and Theorem 3.5. We also show that the optimal restart time regarding completion probability in the case of a fixed cut-off algorithm running on a single processor is also optimal for the algorithm running in parallel on multiple processors. This result is summarised in Theorem 4.2.
In this work we consider randomised algorithms and use random variables to describe the runtime of such algorithms. We write Pr(X) to describe the probability of an event X. Let be a randomised algorithm and A be a random variable describing its runtime on the respective input. The cumulative runtime distribution is given by F(t) = Pr(A ≤ t) for , i.e. the probability that finds a solution within time t. We use f as symbol to describe the probability density function . Other than their existence we do not use further assumptions about the cumulative runtime distribution and the probability density function.
We only consider the fixed cut-off strategy in this work which means that whenever talking about an algorithm or a process, we implicitly mean an algorithm using the fixed cut-off strategy. The fixed cut-off strategy was defined by Luby et. al in .
Strategy 2.1 (). Let be a randomised algorithm and let be any time with t > 0. We slightly modify the behaviour of to obtain a new algorithm called . First add a timer T to which measures the elapsed time. Whenever T exceeds t and did not find a solution, reset the time T to zero and restart (with new independent random choices). Repeat this behaviour until a solution is found.
We also define the use of deadlines.
Definition 2.2. Let be an algorithm and be a time, here called deadline. We modify the behaviour of to obtain an algorithm which uses a deadline. First add a timer T which measures the elapsed time. If T exceeds D, then the computation of is aborted even if the computation is not complete.
3 Completion Probabilities and Parallel Restart Strategies
The probability of a single process finishing within a given time frame was calculated by Wu:
Theorem 3.1 (). Let D be the deadline and t be the restart time, also let At be a random variable describing the runtime of an algorithm with restart time t. Then the number of restarts is , and the leftover time is t0 = D − k ⋅ t. The probability that algorithm running on a single processor finds a solution within the deadline is: (1) The expected runtime (conditioned that the deadline is met) of algorithm is: (2) Here is the expected runtime of the fixed cut-off strategy without any deadline.
This probability can be equivalently stated as Pr(At ≤ D) = 1 − (1 − F(t))k(1 − F(t0)). These calculations can be easily adapted to the case of parallel restart strategies.
Theorem 3.2. Let D be the deadline, n the number of processors and tn the restart time for n processors, also let be a random variable describing the runtime of an algorithm running on n processors and using restart time tn. Then the number of restarts is and the leftover time is tn,0 = D − kn ⋅ tn. The probability that algorithm finds a solution within the deadline is: (3) The expected runtime of algorithm (conditioned that the deadline is met) can be upper bounded by (4) Proof. First we show the claimed probability. Clearly (1 − F(tn))kn ⋅ n describes the probability that none of the processors finds a solution within their respective kn restarts under restart time tn. Therefore, (1 − (1 − F(tn))kn ⋅ n) is the probability that at least one process finds a solution within the given time. Following this argument, (1 − (1 − F(tn,0))n) is the probability that at least one of the processes finds a solution within the leftover time.
To evaluate the expected runtime of the restart algorithm, we start by constructing an algorithm B behaving similarly to . Let be a random variable describing the runtime of B. The algorithm B can only return a solution after l ⋅ tn steps where 1 ≤ l ≤ kn, , or after kn ⋅ tn + tn,0 steps. In other words, if one of the processors finds a solution, the algorithm B waits until it is supposed to restart and then returns its solution. Obviously, B performs worse than the original algorithm. Therefore, we have:
The first part tn ⋅ (1 − (1 − F(tn))n) describes the probability that a solution is found within the restart time, in this case runs for exactly tn steps. Otherwise it requires steps. For D ≥ tn this can be simplified to:
Again, the probability can be expressed equivalently by . This notation is used in some of the proofs. In the special case of t = tn and t0 = 0 this leads to a first result.
In other words, a single process needs deadlines being n times greater to achieve the same probability of finding a solution, as compared to a process running in parallel on n processors. In the following, we analyse the probability of both, the parallel strategy and the single strategy, in the case when the deadline is fixed. We start by using some restrictions on the allowed strategies. Later on strategies with relaxed restrictions are analysed. In the first case we consider identical restart times for both strategies and no leftover time.
Theorem 3.4. Let D be a deadline and , be the number of processors. Let t be a restart time and define At to be a random variable describing the runtime of the single strategy using restart time t. Define analogously for the parallel restart strategy running on n processors. Then the following holds: (6) Proof. This can be shown by using an inductive argument. First consider the case n = 21 and analyse .
We now analyse the case when restart times are not identical. We assume, however, that for an increasing number of processors the restart times are non-decreasing. Later on, we discuss whether this restriction is reasonable.
Theorem 3.5. Let D be a deadline and be the number of available processors. Let ti be the restart time for the algorithm running on ni processors and let ti ≤ D and ∀i: ti+1 ≥ ti. Define ki as the number of restarts of the algorithm running on ni processors with restart time ti, where we require ∀i: ki ≥ 2. Define as a random variable describing the runtime of the algorithm running on ni processors with restart time ti. Define m = i + i⌈log2 k0⌉. Then (7) and (8) holds.
Proof. We start by showing a lower bound as in inequality Eq 7. This can be shown by induction. First examine the probabilities for the case of n0 and n1.
Therefore holds. At this point we analyse the inductive step . We start by examining the numerator. The first step follows because the leftover time ti,0 is less or equal to the restart time ti. Therefore F(ti,0) ≤ F(ti). Since the restart times are non-decreasing the number of restarts are non-increasing, i.e., k0 ≥ ki. By using this fact we obtain .
We now move on to inequality Eq 8. First notice that . Define n′ = 2i+i⌈log2 k0⌉, then it is clear that holds. The rest follows from Theorem 3.4.
4 Optimal Restart Time
For the proof of Theorem 3.5 we used the assumption that the restart times are non-decreasing for an increasing number of processors. We want to assess this assumption and evaluate under which conditions it is reasonable. Before analysing the optimal restart time, we should first point out that there are multiple definitions for the restart time being optimal. On the one hand, the restart time can be chosen such that the expected runtime is minimised, and on the other hand, the restart time can be chosen such that the completion probability is maximised. The optimal values do not have to be the same. In Lemma 4.1 a condition for the optimal restart time regarding the expected runtime is analysed.
Let t′ be the optimal restart time (regarding the expected runtime) for an algorithm operating on one processor and let be the respective optimal restart time for the algorithm operating on n processors. Define T(t′) as the expected runtime for the algorithm running on one processor and let be the respective runtime on n processors. Shylo et al. showed in Theorem 2 of  that always holds.
They also showed that if the hazard function is unimodal and , then holds for all n. While the following Lemma is not stated explicitly in their work, the result is very similar to Theorem 3 in their work. All techniques used in our proof are from their work, the following Lemma is therefore implicit in their work.
Lemma 4.1. If the hazard function of the runtime distribution is unimodal and for n > m, then .
Proof. Shylo et al. showed in Corollary 2 of  that the expected runtime of an algorithm running on n processors and using the optimal restart time is . Using this we have:
They also showed in Theorem 1 of  that the hazard function is non-increasing on an interval [x1, x2] with . Here is the optimal restart time for the algorithm running on a single processor. Since it is known that both and holds, the desired property follows due to the unimodality of the hazard function.
In other words, if the hazard function is unimodal and the speedup ratio of the parallel computing is sublinear, then the restart times are strictly increasing. This lends evidence that the restriction in Theorem 3.5 to non-decreasing restart times is reasonable in many cases.
Next, we move on and analyse the optimal restart time regarding completion probability. Moorsel and Wolter showed in  that in the case of a single processor the optimal restart times are at the equi-hazard intervals, i.e., if t1, … tk are optimal restart times, then they fulfil . The completion probability for a single process as in Theorem 3.1 Eq 1 can be stated analogously. This result can be adapted easily to the case of multiple processors. We can express the completion probability of a fixed cut-off algorithm running on multiple processors, as described in Theorem 3.2, equivalently by the following equation. (9) The extrema of this probability can be attained by equating its derivate to zero. The derivate is given by: Equating the derivate to zero yields: (10) Since the result by Moorsel and Wolter in  and Eq 10 are identical, we can conclude that the optimal restart times (regarding completion probability) are identical. In particular, in case of the fixed cut-off strategy the condition is fulfilled for D/t = k, .
Theorem 4.2. The optimal restart time regarding completion probability for a fixed-cutoff strategy running on a single processor is identical to the optimal restart time for a fixed-cutoff strategy running on multiple processors.
Another implication of Lemma 4.1 together with Theorem 4.2 should be pointed out. When considering a parallel fixed cut-off algorithm such that the underlying hazard function is unimodal, the optimal restart time regarding expected runtime and the optimal restart time regarding completion probability are not identical.
In this section, we analyse the results of this work. Shylo et al. used the speedup ratio to measure the effect of parallelisation on the expected runtime in . For measuring the ‘speedup’ in the case of probabilities we choose to use the odds ratio which we briefly introduce at this point.
There are two common ways to describe the likelihood of an event: probabilities and odds. On the one hand, probabilities describe the chance of an event happening compared to all possible outcomes. On the other hand, odds describe the likelihood of an event compared to its complementary event. For example, if the odds of an event are 10 then the event is 10 times as likely as its complementary event. Both, probabilities and odds can be easily transformed into each other.
Definition 5.1 (). Let A be an event. Then the odds R(A) of an event are defined by: (11) Conversely, given the odds R(A) of an event, the probability can be obtained by: (12)
With this definition the odds value can be used as a measurement of certainty. High values indicate, in our case here, that it is very likely to find a solution within the deadline, while low values indicate the inverse. While this is already a good metric for our purposes, we also want to provide a different explanation on how odds can be used as a metric for the quality of a randomised algorithm with an imposed deadline. Let A be such an algorithm and D be the used deadline. Assume that 0 < p < 1 is the probability that A finds a solution within the deadline D, and Ae is the event of A finding a solution within the deadline. We define a new restart algorithm B based on A’s behaviour. First start A, if A did not find a solution within D steps, restart A. Repeat this scheme until a solution is found. Let X be a random variable which counts the number of failed runs of A and observe its expected value:
An increase in the odds value can be measured by the odds ratio. Which is defined as follows
Definition 5.2 (). Let A1, A2 be events with Pr(A1) = p1 and Pr(A2) = p2. Then the odds ratio with respect to A1 and A2 is defined by: (13) The logarithmic scaled odds ratio is given by: (14)
Therefore, if the odds ratio is greater than one, the expected number of restarts, as described above, is lower than in the original algorithm. In the following, we first provide empirical evidence that the odds ratio scales linearly with the number of processors if the restart times are identical. Then we show this result theoretically.
In the following evaluation all logarithms are natural logarithms. Due to its simplicity, we start by analysing Theorem 3.4. Fig 1 shows the chances of completion for 6 restarts. It is easy to see, that for an increasing number of processors the probability of completing within the deadline converges to 1 much quicker. For a larger number of restarts all of the curves converge to 1 faster, the general statement, however, remains the same. Indeed, the data from experiments matches the theoretical results neatly. We chose to use a SAT solver to examine the results and used problems from the SATLIB  to examine the results. The results can be found in the supporting information. In S1 and S2 Tables the completion probabilities are observed. The average number of restarts which correlates to the upper bound of the expected runtime from Theorem 3.2 is exmined in S1 Fig.
Fig 2 shows the log-scaled odds ratio comparing the probabilities of a restart strategy running on two processors to a restart strategy running on one processor. Again 6 restarts were used to plot this graph. For increasing values of F(t) the log-scaled odds growth appears to be superlinear, possibly even exponential. For more than 6 restarts the odds ratio increases even faster.
Fig 3 represents the log-scaled odds ratio of 2 to 25 processors compared to a single process. The number of restarts and the probability are chosen such, that (1 − F(t))k = 0.75 holds. Again, the data appears to suggest a superlinear increase in the log-scaled odds ratio. However, it is dependent on the values which were chosen for F(t) and k. If, for example, (1 − F(t))k is set to 0.9, then the log-scaled odds ratio for the case of 25 processors is about 5.53. Increasing the number of processors to 28 yields a log-scaled odds ratio of about 29.17. For small values of (1 − F(t))k the log-scaled odds ratio increases much faster. Indeed, we show that the odds ratio increases superlinearly regardless of the value of F(t).
Theorem 5.3. Let 0 ≠ F(t) ≠ 1, with n = 2i for any and p2 = Pr(At ≤ D), then .
This shows the claim.
Finally, we consider Theorem 3.5. Some additional assumptions have to be made to analyse the result. Since the restart times are non-decreasing, for the value for F(tj) has to be evaluated separately for every 0 ≤ j ≤ i. We used a bounded growth function to model these circumstances. The used function can be described as: F(tj + 1) = F(tj) + 0.01 ⋅ (1 − F(tj)). Here, we are only portraying the lower bound of Theorem 3.5.
Fig 4 displays the probability of a single process compared to the probability of multiple processors. It is easy to see that the effect of parallelisation is not as big as in the previous case. In fact, the probabilities of 12, 144 and 1728 processors are notably different only for very low values of F(t). As before this graph was plotted for the case of 6 restarts. If more restarts are allowed all methods converge faster towards 1 and the area of notably different probabilities shifts more towards F(t) = 0.
Fig 5 shows the log-scaled odds ratio comparing 1 and 12 processors for the case of 6 restarts. While the result of Theorem 3.4 implies an unbounded growth, this result shows, that the OR diagram already starts at its maximum and then decreases monotonically. Two effects can be observed for a greater number of restarts. Firstly the log-scaled OR diagram converges much faster towards 0, secondly, the maximum of the plot decreases.
Finally, we are analysing the log-scaled odds ratio for an increasing number of processors. Fig 6 shows this for 6 restarts and up to 26 ⋅ 66 processors for the case of F(t) = 0.00001. For the first few steps the log-scaled odds ratio increases with about linear speed but then stagnates. The stagnating behaviour is reached earlier for both: An increased number of restarts or a higher probability F(t).
Throughout this work, we have shown several interpretations of the completion probability of a restarted algorithm operating on several processors. In Theorem 3.4 we have provided an explicit way to compare the completion probability of a restarted algorithm running on a single core to the same algorithm running on multiple cores in parallel. Later we analysed this result for several values of F(t) and provided evidence that under the assumptions of this theorem the probability scales very well with the number of processors. Regarding the odds ratio it even scales superlinearly, as was shown in Theorem 5.3.
However, it is known that the optimal restart (regarding the expected runtime) in the case of a single processor and in the case of multiple processors are possibly not equal. Therefore, the result of Theorem 3.5 is of special interest. Here we operated under the assumptions that the restart times for an increasing number of processors are non-decreasing. Later in the analysis, we provided evidence that for the lower bound a notable difference can only be achieved for low values of F(t) and the result scales badly for an increasing amount of processors. Of course, it is possible that the actual values are much better than the ones presented here.
We then examined two different notions of optimal restart times. First we showed in Lemma 4.1 that for an unimodal hazard function the restart times are strictly increasing for an increasing number of processors. This lends significance to Theorem 3.5 which is applicable when the algorithm is optimised regarding its expected runtime. On the other hand, we showed in Theorem 4.2 that the optimal restart times regarding completion probability are equal for all number of processors, therefore in this case Theorem 3.4 can be applied. However, it should be noted that finding the optimal restart time requires knowledge about the underlying runtime distribution which can vary vastly depending on the input. Having access to the runtime distribution is often an unnatural assumption in real applications. Therefore, it may be necessary to use non-optimal values as restart times and non-increasing values for the parallel case. Neither the result in Theorem 3.4 nor in Theorem 3.5 requires optimal restart times which makes all of those results interesting for real applications.
S1 Table. Solved instances of the SAT problem.
The used instance was “uf250-04.cnf” from the SATLIB library .
S2 Table. Probabilities of the experiments.
Showing the relative probabilities for a single processor, four processors and the projected probability by using the single relative probability.
S1 Fig. Experimental data on the number of restarts.
The used instance was “uf250-04.cnf” from the SATLIB library . The experiments were run on four processors in parallel. The depicted upper bound correlates with Theorem 3.2. The data from this experiment differs from the previous data since the experiment had to be redesigned to measure the expected number of restarts.
The file “S1_File.zip” is the data gathered from the experiments regarding completion probability.
We thank Prof. Dr. Uwe Schöning and Prof. Dr. Hans Kestler for comments and advice which improved this manuscript. We also thank two anonymous reviewers for their comments on an earlier version of this work.
- Conceptualization: JL.
- Data curation: JL
- Formal analysis: JL.
- Investigation: JL.
- Methodology: JL.
- Project administration: JL
- Resources: JL.
- Software: JL.
- Supervision: JL.
- Validation: JL.
- Visualization: JL.
- Writing – original draft: JL.
- Writing – review & editing: JL.
- 1. Luby M., Sinclair A., and Zuckerman D.. Optimal speedup of Las Vegas algorithms. Information Processing Letters, 47(4):173–180. Elsevier, 1993. doi: 10.1016/0020-0190(93)90029-9.
- 2. A. P. van Moorsel and K. Wolter. Analysis and algorithms for restart. In QEST’04 Proceedings of the The Quantitative Evaluation of Systems, First International Conference, pages 195–204, 2004.
- 3. M. Luby and W. Ertel. Optimal parallelization of Las Vegas algorithms. In STACS 94, number 775 in Lecture Notes in Computer Science, pages 461–474. Springer, 1994.
- 4. Shylo O. V., Middelkoop T., and Pardalos P. M.. Restart strategies in optimization: parallel and serial cases. Parallel Computing, 37(1):60–68. Elsevier, 2011. doi: 10.1016/j.parco.2010.08.004.
- 5. A. A. Cire, S. Kadioglu, and M. Sellmann. Parallel restarted search. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14, pages 842–848. AAAI Press, 2014.
- 6. R. Battiti, M. Brunato, and F. Mascia. Reactive Search and Intelligent Optimization, volume 45 of Operations Research/Computer Science Interfaces Series. Springer, 2009.
- 7. Shylo O. V., Prokopyev O. A., and Rajgopal J.. On algorithm portfolios and restart strategies. Operations Research Letters, 39(1):49–52. Elsevier, 2011. doi: 10.1016/j.orl.2010.10.003.
- 8. A. P. van Moorsel and K. Wolter. Meeting Deadlines through Restart. In MMB & PGTS 2004, pages 155–160. VDE Verlag, 2004.
- 9. H. Wu. Randomization and Restart Strategies. Master’s thesis, University of Waterloo, 2006.
- 10. Sheskin D. J.. Handbook of Parametric and Nonparametric Statistical Procedures, Fifth Edition. Chapman and Hall/CRC, 2011. doi: 10.1201/9781420036268.
- 11. Holger H. Hoos and Thomas Stützle SATLIB: An Online Resource for Research on SAT. In I.P.Gent, H.v.Maaren, T.Walsh, editors, SAT 2000, pp.283–292 IOS Press, 2000. SATLIB is available online at www.satlib.org.