Mission Availability for Bounded-Cumulative-Downtime System

In this research, a mathematics model is proposed to describe the mission availability for bounded-cumulative-downtime system. In the proposed model, the cumulative downtime and cumulative uptime are considered as constraints simultaneously. The mission availability can be defined as the probability that all repairs do not exceed the bounded cumulative downtime constraint of such system before the cumulative uptime has accrued. There are two mutually exclusive cases associated with the probability. One case is the system has not failed, where the probability can be described by system reliability. The other case is the system has failed and the cumulative downtime does not exceed the constraint before the cumulative uptime has accrued. The mathematic description of the probability under the second case is very complex. And the cumulative downtime in a mission can be set as a random variable, whose cumulative distribution means the probability that the failure system can be restored to the operating state. Giving the dependence in the scheduled mission, a mission availability model with closed form expression under this assumption is proposed. Numerical simulations are presented to illustrate the effectiveness of the proposed model. The results indicate that the relative errors are acceptable and the proposed model is effective. Furthermore, three important applications of the proposed mission availability model are discussed.


Introduction
Background and motivation Assuming a repairable system performs one mission type in the same operational environment, the system will be repaired immediately on an operational failure. During the mission time, short downtime could be tolerated. But the cumulative downtime cannot exceed a bounded cumulative downtime before the cumulative uptime has accrued. If the cumulative downtime exceeds the bounded cumulative downtime, not only the mission will fail, but also a penalty will be paid. This kind of system can be found in the nuclear and food industries, and a special system of this kind was examined by Gupta et al. [1]. We always pay close attention to two common dependability measures -reliability and availability. Reliability, defined as the probability that the system remains operational over an observation period, is an appropriate measure for evaluating the effectiveness of systems where no down time can be tolerated [2,3]. Availability, defined as the probability that the system is operating satisfactorily at any point in time under stated conditions, is a more appropriate measure for systems which are usually operated continuously and short down times can be tolerated during their operation [3][4][5]. The choice of a dependability measure often requires a trade-off between the two common dependability measures [6][7][8][9][10][11].
Although there are many categories on availability based on the different definitions of uptime and downtime in the literature, such as interval availability [12], achieve availability [13], and steadystate availability [14], detailed overview of availability can be found in [5]. Availability application can also be found in production planning, maintenance scheduling and so on [9][10][11][12][13][14]. However, for the bounded-cumulative downtime system, the cumulative downtime and cumulative uptime must be considered simultaneously. The existing availability model could not describe the availability characteristics exactly. It is important to set up a model for availability analysis of the bounded-cumulative downtime system.

Literature overview
Through the previous literatures, we found some early literatures introducing the availability analysis of the boundedcumulative downtime system. And the most appropriate description should be mission availability [15]. Corresponding to the failure constraint, the mission availability can be described as the probability that the downtime (cumulative downtime or cumulative failure number in a mission) does not exceed the bounded downtime (bounded cumulative downtime or bounded cumulative failure number) constraint before the total operating time has accrued [1,16]. Although it is important in applications where system bounded downtime can be tolerated [16,17], the mission availability is less studied.
For the bounded downtime, Birolini [14] proposed a closed form expression of the mission availability for a system modeled by an alternating renewal process. Birolini calculated the mission availability by summarizing all the possibilities of having n failures (n = 0, 1, 2 …) during the total operating time. Csenki [15] modeled the mission availability in a semi-Markov process. A closed form solution was also derived. However, as Csenki admitted, this solution is not suitable for computational work. Kodama et al. [18] analyzed system mission reliability for a oneunit system with allowed downtime. Gupta et al. [16,17] discussed a two-unit cold standby system where each unit can work in three modes and bounded downtime. They analyzed the system by utilizing regeneration points and discussed mean time to system failure, point availability and steady-state availability besides mission availability. Similarly, Dunbar [19] presented an expression for the probability of a failure of a system consisting of two components. When a system is declared to be failed, both components must fail and remain in the failed state for at least a given finite time. However, the total operating time is a constant. Furthermore, the cumulative operating time is random because of the random failure numbers and downtime of each individual failure.
For the constraint of cumulative downtime, Birolini [11] also proposed a closed form expression of the mission availability. Gao and Zhu [17] proposed a simulation algorithm of cluster system mission availability. Nicola and Bobbio [18] discussed unified performance and reliability analysis of a system which alternates between up state and down state. The system could reach a catastrophic condition when the cumulative downtime exceeds a critical threshold. A mission will be completed with a specified amount of work before the system reaches the critical threshold. The preemptive-resume and preemptive-repeat failure were considered respectively. Based on Markov for unified performance measures, closed-form expressions, such as system lifetime, mission reliability, interval availability and instantaneous availability, have been obtained. However, the mission availability has not been considered. In this case, the total operating time is defined as a constant again. Furthermore, the cumulative operating time does not have any constraint since the failure numbers and failure time of each individual failure are random. Although the cumulative operating time was constrained in Gao et al. [17] and Nicola et al. [18], the closed form expression of mission availability has not been given out. Goyal and Nicola et al. [20] discussed the constraint of bounded number of downtimes. In their study, the preemptive-resume failure and preemptive-repeat failure were also considered. However, only the expressions of system lifetime and the probability of mission completion were specified.

Objective and outline
In the existing studies, the alternating renewal process and simulation method are widely used to analyze the mission availability. Besides, the total operating time and the cumulative downtime were seldom considered as constraints simultaneously. To the best of our knowledge, only the researchers in Gao et al. [21] and Nicola et al. [22] considered the total operating time and the cumulative downtime as constraints simultaneously. However, the closed form expression of mission availability was not given out. As an attempt, we will take all the failures as one failure during a mission, and the cumulative downtime of assumed failure is equal to the sum of all the failure downtime. In addition, we will process the cumulative downtime and cumulative uptime as constrains simultaneously. So the mission availability can be described as the probability that the cumulative downtime does not exceed the bounded cumulative downtime before the cumulative uptime has accrued. The probability should include two mutually exclusive cases. One case is the system has not failed, where the probability can be described by system reliability. The other case is the system has failed and the cumulative downtime does not exceed the constraint before the cumulative uptime has accrued. In the present study, we will try to use the cumulative downtime distribution of the assumed failure to describe the probability. A mission availability model under this assumption is proposed, giving the dependence in the scheduled mission. Numerical examples are presented to illustrate the effectiveness of the proposed model. Three important applications of the proposed mission availability model, such as design and optimal analysis, mission scheduling, are discussed.
Comparing with the existing literatures, the differences between our study and the existing researches are reflected in the followings: (1) Unlike literatures [14][15][16][17][18], the cumulative uptime and the cumulative downtime are considered as constraints simultaneously in our study. (2) The proposed mission availability model has closed form expression. Although Gao et al. [20] and Nicola et al. [18] considered the cumulative uptime and the cumulative downtime as constraints simultaneously, the closed form expression of mission availability was not given out.

Problem formulation
Considering a system performs one mission type in the same operational environment, the system will be repaired immediately upon an operational failure. System executing a mission successfully must work cumulative T o units of time. Meanwhile, the cumulative downtime constraint must be satisfied. In a word, the cumulative downtime cannot exceed the bounded cumulative downtime before the cumulative uptime T o has accrued. The execution process of system missions is displayed in Figure 1. In Figure 1, X ij denotes the uptime between failures during the i th mission. Y ij denotes the downtime of j th failure during the i th mission. Let Z di denotes the cumulative downtime during the i th mission. It is equal to the sum of all the failures downtime during the i th mission, and calculated as:

Assumptions
In extant literatures, we found that it is very difficult to model Z di when carrying out mission availability analysis. The alternating renewal process and simulation method are widely used to model or simulate Z di . In the present paper, we will use an approximate method to solve this problem. Unlike the alternating renewal process and simulation method, we assume Z di as a random variables. The approximate distribution of Z di can be determined by statistical method. In another words, we assume that the approximate distribution of Z di during the i th mission can describe the failure behavior. Besides, there are some assumptions as follows.
(1) System downtime contains the direct repair time and indirect waiting time (e.g. the time spent on failure detecting, failure diagnosis and preparing the spare parts) [23,24]. (2) We do not care the differences among the component failures.
(3) Assume the repair is perfect and the system can be restored to the state as new [25]. (4) Under the assumptions (2) and (3), X ij , Y ij and Z di are independent and identically distributed [26]. So we set the cumulative failure distribution (CDF for short) of X ij , the cumulative failure distribution of Y ij and cumulative repair distribution (CRF for short) of Z di as F x ð Þ, y y ð Þ and W z ð Þ, respectively. Such that, F x ð Þ represents the probability that the system will fail before its cumulative uptime reaches x. y y ð Þ represents the probability that the system can be resorted to the state as new before the downtime reaches y. While W z ð Þ represents the probability that the system can be resorted to the state as new before the cumulative downtime reaches z.

Model Development
According to the background and assumptions mentioned above, we can describe the proposed mission availability model as shown in Figure 2. So mission availability can be delimited as the probability that system cumulative downtime does not exceed t Ã d before the cumulative uptime T Ã o has accrued. Due to the failure occurred in the last mission may delay the start of the following n missions, the delay-start time can be decomposed in two scenarios [27]: Scenario 1: the first scenario happens when the following missions are not delayed, which means the system is in the operating state all the time when the next mission starts. Scenario 2: the second scenario happens when the following missions may be delayed. We stated them as mission independence and mission dependence as shown in Figure 1 a) and b) respectively. Set the delay-start time of the i th mission as t i . So, t i = 0 means the scheduled mission is independent. Otherwise, the scheduled mission is dependent.
Next, the mathematical description will be given out.

Scenario 1
Mission independence means the system is in the operating state all the time when the mission starts. Therefore, the mission availability denoted asMA can be defined as [14]: Á~P robability of the system cumulative downtime that occurs before the system In the equation (1), two mutually exclusive cases should be considered that the system can execute a mission successfully: Case 1: the system operates T Ã o units of time without a failure; Case 2: the system encounters failure, while the cumulative downtime cannot exceed the bounded cumulative downtime before the cumulative uptime T Ã o has accrued. For the case 1, the system has not failed during 0, . For the case 2, the system has failed, but system cumulative downtime does not exceed the bounded cumulative downtime before the cumulative uptime has accrued. According to the assumption 2), if there is a failure during the i th mission, the system must fail in the time interval The probability that the system can be restored to the state as good So the probability that the system has failed but system cumulative downtime does not exceed the bounded cumulative downtime before the cumulative uptime has accrued, is can be expressed as: Scenario 2 For the scenario of mission dependence, the following n missions may be delayed because of the long downtime during the i th mission. Therefore, the probability that scheduled mission  start successfully must be considered in two mutually exclusive cases: , the scheduled mission can start, but the bounded cumulative downtime has reduced to iTzt Ã d {t i . The probability that system cumulative downtime does not exceed iTzt Ã d {t i before the cumulative uptime T Ã o has accrued can be denoted asMA 1 .
Probability of the cumulative downtime that occurs beforethe cumulative uptime Let p t i ð Þ denotes the probability that t i can get any value in For the discrete random variable t i , let p t i ð Þ denotes the probability that T i~ti . W t i ð Þ is the probability that T i ƒt i . The relations between p t i ð Þ and W t i ð Þ are given by: Following equation (2), the probability that system cumulative downtime does not exceed t Ã d {t i before the cumulative uptime T Ã o has accrued can be represented as: and system cumulative downtime does not exceed t Ã d {t i before the cumulative uptime T Ã o has accrued, then the probability can be computed as: Case 2: the system has not failed during the last mission or the system can be restored to the state as good as new before the next mission starts. The probability that system cumulative downtime does not exceed t Ã d before the cumulative uptime T Ã o has accrued is denoted asMA 2 , and calculated by: The probability with no failure is 1{F T Ã o À Á . And the probability that the t i~0 can be represented as: So, the probability when no failure or t i~0 is: According to the equations (2) and (6), we can get: AddingMA 1 to MA 2 , the mission availability under the case of mission dependence becomes: Until now, the mission availability model has been proposed. The modeling process will be further discussed as follows.

Parameter estimation and goodness-of-fit test
According to the equations (2) and (9), the critical modeling step of the mission availability is to determine F x ð Þ and W z ð Þ. In the practice, F x ð Þ and W z ð Þ can be obtained through fitting the field failure and repair data by the common models, such as Weibull, Exponential, Gamma and Lognormal model. The Maximum Likelihood method can be used to estimate the parameters. Take Weibull model as example, the likelihood function can be presented as: # which is also given by: where f : ð Þ is the probability density function of the fitted model. The model parameters can be obtained by directly maximizingln L h ð Þ ½ . The Chi-Squared Test can be used to test the goodness-of-fit, which is given by Blischke and Murthy [2]: Take Weibull model as example, E i can be presented as: where n is the sample size, with each observation falling into one of k possible classes (A rule of determining k is that the expected frequency E i should satisfy E i §5. Otherwise, to combine classes if E i v5.). O i is the observed frequency in classi, and E i is the expected frequency. The smaller the x 2 is, the better the fitted model is.
Thus, once the model parameters of F x ð Þ and W z ð Þ are determined, the mission availability can be calculated. Numerical simulations will be presented to verify the rationality of the assumption and the effectiveness of the proposed model next.

Numerical Examples
Monte Carlo (MC for short) simulation is a commonly used simulation method [28]. The MC simulation is thus adopted in our numerical simulations. According to the assumption and the proposed mission availability model, the rationality of obtaining W t ð Þ is the key assumption. So we pay more attention to this assumption in the simulation examples, and four common models are used to test the cumulative repair distribution. The simulation frame can be summarized as follows.
(1) Define T, T Ã o , t Ã d , number of simulation times k, where T, T Ã o , t Ã d are constants (this can be determined by operation manager or optimization).
(2) Using the MC simulation to simulate the execution process of system missions as shown in Figure 1 with given F x ð Þ and y y ð Þ. The detailed MC simulation procedure of mission independence and dependence are displayed in Figure 3 and In order to ensure the credibility of the simulation examples, the followings are considered in setting variables.
The detailed simulation variables are listed in Table 1. Set the simulation times, denoted as k in Figure 3, to be 50000. Then we can get the simulative mission availability values and the downtime data set Z di ,i~1,2,:::,k ½ . The common models are used to fit the downtime data set. The model parameters are estimated by the Maximum Likelihood method whilst the goodness-of-fit can also be obtained. Through the Chi-Squared test, the appropriate model can be determined. Then, the mission availability values with the proposed mission availability model and the relative errors are calculated. The mean relative errors are displayed in Table 1. The simulation results show that almost all the mean relative errors of each simulation are less than 1.5%. Generally speaking, this relative error can be accepted [29]. As is shown in the mission availability estimation, the maximum percent error is 1.46. This implies that there is no noticeable impact on the actual mission availability. So the assumption, which takes the cumulative downtime in a mission as a variable to model the mission availability, is rational and the proposed model is effective. Studying the proposed mission availability model, the estimated error may come from the fitted F x ð Þ and W z ð Þ. If more accurate results are expected, more attention should be paid to fit more accurate model of F x ð Þ and W z ð Þ in the future research.

Model Applications
In this section, three important applications of the proposed mission availability model will be discussed. Firstly, it can be used to carry out mission availability analysis. The relationship among reliability, maintainability and mission availability can be obtained. Secondly, this model can be used in system design and optimum analysis. The optimal levels or not-saturation interval of reliability and maintainability can be determined when the mission availability is set as a requirement value. Finally, a cost function will be given out to determine the optimal bounded cumulative downtime in mission scheduling. In addition, the proposed model may have some other potential applications. For example, reliability and maintainability allocation, determining the improvement indirect and target at the reliability improvement, maintenance resources optimization and scheduled maintenance. These potential applications will be further researched in the future work.

Mission availability analysis
Mission availability contains reliability and maintainability characteristics. Through mission availability analysis, the relationship among them can be obtained. For the mission availability under the scenario of mission independence, the relationship among reliability, maintainability and mission availability can be obtained only with a deformation of (2): Clearly, if any two of the three reliability measures are obtained, the third one is easy to be calculated. The relationship among system reliability, maintainability and mission availability is displayed in Figure 5   The mission availability under the scenario of mission dependence can also be analyzed in the same way. We only give out a numerical example with exponential failure and repair distribution here, while other failure and repair distribution can also be simulated similarly. The relationship among system reliability, maintainability and mission availability is displayed in Figure 5 (b) with exponential reliability rate 0.01(0.01)0.1 and Exponential repair rate 0.002(0.002)0.02.
Lognormal model is a well-known model for modeling maintainability. So, two special numerical cases with lognormal model are presented as follows. The numerical results are displayed in Figure 6.
In order to investigate the relationship with system reliability, six different values of F T Ã o À Á have been examined (namely: :4 0:1 ð Þ0:9), where W z ð Þ is Lognormal distribution with m~3:4, s~0:95. Figure 6 shows that, with the increasing of bounded cumulative downtime, the system mission availability will increase when W z ð Þ is given. If the bounded cumulative downtime is long enough, the mission availability is approximate to 1. In addition, the reliability level is higher, the time that system mission availability increase to a certain value is less. Furthermore, we set the W z ð Þ as Lognormal model with m~3:0 0:5 ð Þ4:5, s~1:0 to analyze the system mission availability, where 1{F T Ã o À Á~0 :4. The system mission availability increases with the increasing of bounded cumulative downtime. The mission availability is approximate to 1 when the bounded cumulative downtime is long enough.

Design and optimum analysis
How to determine the optimal levels of reliability and maintainability is a very interesting optimization problem when the mission availability is set as a constraint. The optimization problems exist widely in system reliability design and reliability improvement. Figure 7 shows the change trend of reliability and maintainability when the system mission availability is set as a constraint. With the increasing of system reliability, the system maintainability reduces very small before the system reliability reaches a certain value R 1 . On the contrary, the system maintainability reduces rapidly after the system reliability reaches another certain value R 2 . The interval (0, R 1 ) and (R 2 , 1] is the saturation interval. And the interval [R 1 , R 2 ] is no saturation interval. This phenomenon is defined as saturation effect [30]. The linear approximation [31] is used to determine the no saturation interval. Firstly, we use the piecewise-linear model [31,32]   System performance and the cost are acceptable only when the optimal value of reliability and maintainability fall in the no saturation interval. In order to achieve the optimal, we need to trade-off the maintainability and reliability design according to the total cost when the system mission availability is set as a constraint.
Assume the design cost can be presented by the function of maintainability and reliability. The optimal design value can be calculated according to the trading-off cost. Set the current system mission availability level as MA a t Ã d À Á , denoted as MA in Figure 7, where the expected system mission availability level is B. We can adopt the method of increasing system reliability or maintainability to satisfy the required system mission availability. In this case, it is important to place the economic analysis to determine the design or improvement level of reliability and maintainability. Set x~F T Ã o À Á and y~W z ð Þ, where C x x ð Þ and C y y ð Þ are the cost of x and y respectively. Then, the optimal reliability and maintainability level can be determined by minimizing the total costC: Where x and y must satisfy: The optimal reliability and maintainability level can be determined when C x x ð Þ, C y y

Mission scheduling
Consider a system executing production missions circularly. System reliability level is fixed. So the required mission availability should be reached by reason of the unlimited increasing of cumulative downtime [33]. Taking Figure 6 as example, if the system mission availability is required to reach 0.85 and the reliability is 0.4, 0.5, 0.6, 0.7 and 0.8, the required bounded downtime cannot be less than 58, 50, 41, 31 and 16 units of time respectively. More complex, if the mission availability also needs to be optimized, the cost or other criterion can be considered. Here, cost is an optimization criterion.
Assuming the profit of a system production mission performed successfully is C p , if the system is unavailable for a mission, so the system will lose the profit of C p . And the expected unavailability cost C 1 can be computed by: where MA t Ã d À Á is the system mission availability. In addition to the unavailability cost, the cost of downtime also affects the profit. Assuming the cost of downtime increases linearly with the increasing of cumulative downtime, we have: where C d is the expected downtime cost per unit of time. Thus, the total cost is: The mission availability is an increasing function to the bounded cumulative downtime, so the first term of the right side in (18) decreases with t Ã d while the second term obviously will be increasing tot Ã d . Hence, in order to minimize the total cost of (18), t Ã d should be optimally selected.

Limitations of the Study, Open Questions, and Future Work
The aim of this research effort is to present an approximate method to model mission availability for bounded-cumulativedowntime system. Although we have obtained an ideal result, the current study still has some limitations. First of all, we assume the repair is perfect and the system can be restored to the state as new. However, the repair of repairable system may be imperfect. Meanwhile, the mean time between failures will reduce with the usage increasing, while the cumulative downtime will increase. So the cumulative downtimes are not independent and identically distributed. Hence, the traditional reliability models cannot be used to model the cumulative downtimes. The existing repairable system reliability models should be combined in the proposed mission availability model. More research is needed to expand the application scope of the proposed model.
Secondly, the structure dependence and importance of component are not considered in the proposed model. The failure uptimes and downtimes are used to measure the system reliability and maintainability. However, components have different importance and impact on system performance. Hence, the modeling of uptimes and downtimes under considering the structure dependence and importance can be paid more attention to in the further research.
Thirdly, System reliability and maintainability can be obtained by fitting the field failure and repair data. For a new system or with a short operation history, the field failure and repair data may be insufficient to support the modeling accuracy. In the numerical simulations, the maximum percent error in the mission availability estimate is as high as 1.46. Although this implies that there is no noticeable impact on the actual mission availability, more research is needed to study the situation of small sample data to obtain more accuracy results.
Finally, In addition, the proposed model may have some other potential applications. For example, reliability and maintainability allocation, determining the improvement indirect and target at the reliability improvement, maintenance resources optimization and scheduled maintenance. These potential applications will be further researched in the future work.

Conclusion
In this research, an approximate method was used to model mission availability for bounded-cumulative-downtime system. All failures in a single mission are assumed as one total failure, whose cumulative downtime is equal to the sum of all failures' downtime.
The approximate distribution was determined and used to develop the proposed mission availability model. In proposed model, the cumulative downtime and cumulative uptime are set as constrains simultaneously. Then numerical simulations are presented to illustrate the rationality of the assumption and the effectiveness of the proposed model. Finally, the maximum percent error in the mission availability estimate is 1.46. This implies that there is no noticeable impact on the actual mission availability. Based on the acceptable relative errors, the proposed mission availability model is effective and the assumption that takes the cumulative downtime as a variable to model the mission availability is rational.
Due to the closed expression, the proposed mission availability model can be widely adopted. Three important applications were discussed. We have also carried out numerical examples to illustrate the application process. For mission availability analysis, the relationship among reliability, maintainability and mission availability can be obtained. In addition to the design and optimum analysis, no-saturation interval is given out. And a method of determining the optimal reliability and maintainability level is proposed. A method to determine the optimal cumulative downtime with minimizing cost is also suggested for the mission scheduling.