A stochastic hybrid systems based framework for modeling dependent failure processes

In this paper, we develop a framework to model and analyze systems that are subject to dependent, competing degradation processes and random shocks. The degradation processes are described by stochastic differential equations, whereas transitions between the system discrete states are triggered by random shocks. The modeling is, then, based on Stochastic Hybrid Systems (SHS), whose state space is comprised of a continuous state determined by stochastic differential equations and a discrete state driven by stochastic transitions and reset maps. A set of differential equations are derived to characterize the conditional moments of the state variables. System reliability and its lower bounds are estimated from these conditional moments, using the First Order Second Moment (FOSM) method and Markov inequality, respectively. The developed framework is applied to model three dependent failure processes from literature and a comparison is made to Monte Carlo simulations. The results demonstrate that the developed framework is able to yield an accurate estimation of reliability with less computational costs compared to traditional Monte Carlo-based methods.


Introduction
Failure of industrial components, systems and products may be caused by multiple failure processes, e.g. wear, corrosion, erosion, creep, fatigue, etc. [1]. In general, the failure processes are categorized as degradation processes (or soft failures) and catastrophic failure processes (or hard failures) [2]. Soft failure is caused by continuous degradation and is often modeled by a continuous-state random process, e.g., Wiener process [3,4], Gamma process [5][6][7], inverse Gaussian process [8][9][10], continuous-time semi Markov process [11], etc. Hard failure is caused by traumatic shocks in various patterns and is often modeled by a discrete-state random process, e.g., Homogeneous Poisson Process (HPP) [11][12][13], Nonhomogeneous Poisson Process (NHPP) [14][15][16], etc. Often, complex dependencies exist among the failure processes [17]. For example, [18] presents experimental data to show that erosion and corrosion can enhance each other and therefore accelerate the failure process. Also, it is observed in [19] that the dependency between creep and fatigue severely reduces the Time-To-Failure (TTF) of the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 specimens that are exposed to high temperatures and heavy loads. To accurately describe the failure behavior affected by multiple failure processes, the possible dependencies among the failure processes need to be properly addressed.
In literature, various methods have been developed to model the dependencies among degradation processes and random shocks. Peng et al. [20] develop a dependency model where the arrived shocks lead to an abrupt increase of the degradation process. Wang and Pham [21] investigate systems subject to dependent competing risk, which suffer failures due to degradations and random shocks: the model is proposed of shocks that can cause immediate failure of the system, with a time-dependent probability p(t), or can increase the degradation level with probability 1−p(t). Cha and Finklstein [22] assume that a shock can lead to a hard failure with probability p(t), or can increase the degradation rate with probability 1−p(t). Jiang et al. [23] develop a model that considers that the threshold of hard failures can be shifted by random shocks. Rafiee et al. [24] consider that the degradation rate is increased by a series of shocks. Jiang et al. [1] categorize shocks into different shock zones based on their magnitudes and consider that shocks in different zones have different effects on the degradation process. Bagdonavicius et al. [25], Fan et al. [26] and Ye et al. [27] develop models that consider that the probability of hard failures is increased as the degradation process progresses. Huynh et al. [14,15] investigate maintenance strategies for a dependence model, where the intensity of the NHPP for random shock is a piecewise function of the degradation magnitude. Fan et al. [16] present a reliability model for sliding spools considering that the intensity of the NHPP describing the random shock process is a linear function of the degradation level.
For models that consider the dependencies between degradation and random shock processes, like these above, it is often too complicated, if not intractable, to evaluate system reliability analytically. Then, simulation methods, such as Monte Carlo methods [28], are used, often with limitations due to heavy computational burden. In this respect, Stochastic Hybrid Systems (SHS) [29] offer a new way to model the stochastic behavior of systems that involve both discrete and continuous states [30][31][32][33]. SHS describe the system's behavior by a set of differential equations and therefore, whose solution avoids the computational burdens of simulation methods. Various forms of SHS exist in literature (see Pola et al. [29] for a comparison). In this paper, we adopt the models recently developed by Hespanha in [34][35][36], which is similar to the Piecewise Deterministic Markov Process (PDMP) [37] but differs from it in that the continuous state variable follow Stochastic Differential Equations (SDEs), rather than Ordinary differential equations (ODEs). To the best of our knowledge, it is the first attempt to use SHS for modeling dependent failure processes.
It should be mentioned that SHS is similar to Stochastic Hybrid Automata (SHA), which is also applied in Dynamic Reliability (DR) assessment or Dynamic Probabilistic Risk Assessment (DPRA) [38,39]. Both methods model dynamic hybrid system behaviors that involve stochastic factors. SHA introduces less assumptions than SHS and resorts to Monte Carlo simulation for the calculations [40,41]; SHS, on the other hand, is able to describe the hybrid dynamics analytically or semi-analytically, by solving a set of Differential Equations (DEs) on the expense of introducing more assumptions [36]. The computational cost of SHS is, in general, less than that of SHA, but on the expense of more assumptions in particular with respect to the degradation models, whereby epistemic uncertainty (specifically model uncertainty) [42][43][44] is introduced. When applying the SHS in practice, then, care should be taken to ensure that the assumptions are consistent with the actual situation, in particular in case of systems characterized by complex and numerous dependencies among physical processes and failure behaviors. In this paper, we choose SHS because the type of dependent degradation and shock processes allows for modeling by SHS and, in general, SHS has a better computational performance than SHA.

SHS model
The state space of a SHS model is a combination of discrete and continuous states. Let us denote the discrete states by q(t), q(t) 2Q, where Q is a finite set containing all the possible discrete modes of the system. The continuous states are denoted by x(t), x(t)2R l . A SHS model is defined based on the following assumptions [34][35][36]: 1. The evolution of the continuous states is governed by a set of SDEs: where w t : R + ! R k is a k-dimensional Wiener process; f: Q×R l ! R l and g: Q×R l ! R l×k , respectively.
2. At any time t, if the system is in state (q(t), x(t)), it undergoes a transition with a rate λ ij (q(t), That is, the probability that the system undergoes a transition from state i to state j within the interval [t, t + Δt) is: 3. Whenever the system undergoes a state transition from state i to state j, it instantaneously applies the map ϕ ij (q(t), x(t)) to the current values of q(t) and x(t), so that their values are reset: where the notation a(t − ) represents the left-hand limit of the function a at time t.

SHS formulism for dependent failure processes
The modeling framework for dependent failure processes involves three elements, i.e., a model for the degradation process, a model for the shock process and a model for the dependency between the two processes. The following assumptions are made in order to model a dependent failure process in the framework of SHS: Assumption (1). The degradation processes are characterized by x(t) = (x 1 (t), x 2 (t), . . ., x l (t))2R l . The elements in x(t), x i (t), 1 i l, are performance parameters for the degradation processes and are independent from one another. Soft failure occurs whenever where H i is the failure threshold for the performance parameter x i (t).
Assumption (2). The system has n potential health states, i.e., q(t)2Q where q(t) is a discretestate variable that quantifies the system's health state at time t, and Q = {1, 2, . . ., n} is a set containing all the possible system states. When q(t) = n, a hard failure occurs.
Assumption (3). Transitions between system health states are triggered by the arrival of random shocks with the transition rate λ ij (q(t), x(t)), i, j 2 Q where the probability that the system jumps from state i to state j in the interval [t, t + Δt) is given by Eq (2).
Assumption (4). Between the transitions, the degradation of x(t) is characterized by the SDEs in Eq (1) for q(t) = 1, 2, Á Á Á, n − 1. When q(t) takes different values, the form of f(Á) and g(Á) can be changed to reflect the dependency behavior. When q(t) = n, which indicates that the system fails due to hard failure, we impose that x(t) = 0.
Assumption (5). An arrival random shock resets the current values of q(t) and x(t), using the reset map defined in Eq (3).
Assumption (6). System failure is caused by both soft and hard failures, whichever occurs first.
Given a dependent failure process, the following steps show how to model it in the framework of SHS: Step 1: Modeling degradation. In this step, the performance parameters x(t) are identified to characterize the degradation processes. For the performance parameters, the SDEs in Eq (1) are developed to describe their degradation, considering both deterministic and stochastic characteristics. The deterministic characteristics are often described based on the physical knowledge on the degradation processes (e.g., using the Physics-of-Failure (PoF) models [19]), while the stochastic characteristics are modeled by a Wiener process, as shown in Eq (1).
Step 2: Modeling random shocks. In SHS, random shocks are considered as transitions among the system health states. The transition rates, λ ij (q(t), x(t)), i, j 2 Q, need to be determined based on historical data or expert judgments.
Step 3: Modeling dependencies. Finally, the dependencies between the degradation processes and random shocks need to be considered. The dependencies can be modeled in various ways in SHS. For instance, by resetting the values for x(t), the reset map in Eq (3) can capture the influence of the random shock on the degradation process. Further, the functions f, g and even λ itself, as shown in Fig 1, are dependent on the current values of x(t) and q(t), which provides a versatile way to model the dependencies.
Note that in order to make sure that the developed SHS model is solvable in case that truncations techniques [36] are needed, for example Case 3 in this paper, the f i , g i , λ ij , ϕ ij , i, j 2 Q. in the SHS model have to be polynomial functions of x(t).

Conditional moments estimation
In this section, we derive the conditional expectations for the continuous state variables, i.e., The conditional expectations will be used in the next section for reliability analysis. Let us define a test function to be where m: = (m 1 , m 2 , . . ., m l ), m2N l , and x m :¼ x m 1 1 x m 2 2 Á Á Á x m l l , and let the m-order conditional moment of the continuous state x be For a general test function ψ(q(t), x(t)), ψ: Q×R l !R, which is twice continuously differentiable with respect to x, the evolution of its expected value is governed by Dynkin's formula [36]: where (Lψ)(q, x) is the extended generator of SHS and 8(q, x)2Q×R l , (Lψ)(q, x) is given by where @ψ / @x and @ 2 ψ / @x 2 denote the gradient and Hessian matrix of ψ(q,x) with respect to x, respectively; trace(A) is the trace of the matrix A, i.e., the sum of elements on its main diagonal. Substituting Eq (5) into Eq (7), we get a group of differential equations with respect to m m The evolution of m m ð Þ i t ð Þ can be depicted by solving Eq (9). The conditional moments can, then, be obtained by assigning proper values for m: if we let m = (0, 0, . . ., 0) we have If we let where m j denotes the jth element in m and p is a natural number, we have The conditional expectations, E½x p j t ð Þjq t ð Þ ¼ i; p 2 N; i 2 Q; j ¼ 1; 2; Á Á Á ; l, can, then, be calculated by combining Eqs (10) and (11).

Reliability analysis
From Assumption 6, system reliability can be expressed as: From the law of total probability, we have Since we assume that the degradation processes are independent from one another, Eq (13) becomes In Eq (14), Pr(q(t) = i) can be calculated by Eq (10), where m Ã,j and m ÃÃ,j are given by Based on FOSM [45], Pr(x j (t) < H j |q(t) = i) can be approximated by Substituting Eq (17) into Eq (14), the reliability of the system is approximated by wherem x j jq¼i t ð Þ,ŝ x j jq¼i t ð Þ are calculated by Eq (15).
The accuracy of the approximation by FOSM relies on the normality assumption: the random variables x j (t)|q(t) = i, i21, 2, . . ., n − 1, j = 1, 2, Á Á Á, l are normally distributed with mean value m x j jq¼i t ð Þ and standard deviation s x j jq¼i t ð Þ. In practice, the assumption does not always hold. Therefore, we also present an estimation method for the lower bound of the system reliability, using Markov inequality. According to Markov inequality [46], if X is a nonnegative random variable and a>0, then Using Eq (19), we obtain From Eqs (14) and (20), the lower bound of system reliability can, then, be derived: where m Ã,j has the same meaning as in Eq (16).

Results and discussion
Case 1 System description. The first case study to demonstrate the developed framework is adapted from [20]. A MEMS device is subject to two dependent failure processes, i.e., soft failures caused by wear and debris from shock loads, and hard failures due to spring fracture caused by shock loads [20]. The soft failure is modeled by a continuous degradation process and the hard failure is modeled by a random shock process. Dependence exists among the two processes: the arrival of a shock brings an additional contribution to the degradation process. Failures occur whenever one of the following two events happens: • the degradation process reaches its threshold, denoted by H; • a shock whose magnitude exceeds a critical level, denoted by D, occurs.
Additional assumptions include: 1. The continuous degradation process follows an SDE.
where w t 2 R is a standard Wiener process, μ β , σ β are constants and the initial degradation level at t = 0 is null.

The random shock process is a HPP with intensity λ.
3. The magnitudes of shock loads, denoted by W i , are i.i.d. random variables following a normal distribution, 4. The arrival of each shock brings a degradation increment d, which is a random variable following a normal distribution d $ N ðm d ; s 2 d Þ. Fig 2 to describe the behavior of the system. The system has two health states, q(t)2{1,2}. When q(t) = 1, the system is subject to the degradation process according to Eq (22). When q(t) = 2, the system fails due to hard failure and the degradation level is set to zero, i.e. x(t) = 0.

SHS formulation. A SHS model is constructed in
As shown in Fig 2, the initial health state of the system is state 1. The transition rates and reset maps of the SHS are defined as follows: In Eq (24)), the reset map ϕ 11 (q,x) models the dependency in Assumption Eq (4). Reliability analysis. We define the test functions c m ð Þ i q; x ð Þ; i 2 1; 2 f g; m 2 R; to be: Since x(t) = 0 when q(t) = 2, we only consider the 0-order conditional moment of the degradation at state 2, i.e. c 0 ð Þ 2 q; x ð Þ. By substituting Eqs (22), (23) and (24) into Eq (8), the extended generator of the SHS model is: According to Eqs (7) and (26), the differential equations governing the conditional moments are: (27), we can obtain the following set of differential equations: Since the system must belong to one of the two health modes, it is obvious that From Eqs (18) and (21), the estimated system reliability R e (t) and the lower bound R l (t) are calculated by where the conditional moments m 0 t ð Þ are obtained by solving the differential equations in Eqs (28) and (29).
Numerical calculation. A numerical example is conducted with the parameters in Table 1, taken from [20]. To solve the differential equations in Eqs (28) and (29), the solver based on Runge Kutta method in Matlab R2013a is used. In [20], the original model is simulated using Monte Carlo methods to analyze system reliability. A comparison is made between the results obtained by the developed methods in this paper and those obtained by Monte Carlo simulation. The sample size of the Monte Carlo simulation is 10 4 . In Fig 3(A)-3(C), the moments with order 0, 1 and 2 are compared with simulation results. In Fig 3(D), the estimated reliability and its lower bound are compared to the results of Monte Carlo simulation.
The comparisons show that the moments are accurately predicted by the SHS model. The estimated reliability by FOSM is consistent with the result by Monte Carlo simulation. The estimated lower bound provides a relatively conservative reliability estimation. The running time of Monte Carlo simulation is 5788.2 times more than that of the developed SHS-based approach.

Case 2
System description. The second case study to demonstrate the developed framework is adapted from [24]. A MEMS device is subject to two dependent failure processes, i.e., soft failures and hard failures. The soft failure is modeled by a continuous degradation process and the hard failure is modeled by a random shock process. Dependences between the two failure processes exist in the following two aspects: (1) the arrival of each shock brings a degradation increment; (2) the degradation rate increases when the system undergoes a series of shocks. The second type of dependence has been investigated based on four different shock models in doi:10.1371/journal.pone.0172680.t001 [24]: extreme shock, δ shock, m shock, and run shock models. In this section, we apply our framework by considering the first dependence and the second dependence triggered by the extreme shock model. Failure occurs whenever one of the following two events happens: • the degradation process reaches its threshold, denoted by H; • a shock whose magnitude exceeds a critical level, denoted by D 1 , occurs.
Additional assumptions include: 1. Random shocks arrive according to a HPP with intensity λ.
2. The magnitudes of shock loads, denoted by W i , are i.i.d. random variables following a nor- 3. The arrival of a shock whose magnitude is less than D 0 would bring a degradation increment d, which is a random variable following a normal distribution d $ N ðm d ; s 2 d Þ. 4. The arrival of a shock whose magnitude belongs to [D 0 , D 1 ) would trigger the change of degradation rate. Let J denote the total number of arrival shocks when the trigger shock occurs, and T J denote the time when the Jth shock arrives. Then, the continuous degradation process is modeled by the following SDEs, where w t 2R is a standard Wiener process, are constants, and the initial degradation level at t = 0 is null.

SHS formulation.
The SHS concerning this case is described by the state-transition diagram in Fig 4. The system has three health states, q(t)2{1,2,3} When q(t) = 1, the system is subject to the degradation at a low-level degradation rate. By contrast, when q(t) = 2, the system degrades at a high-level degradation rate. System's degradation under the first two health states evolves according to the following SDEs: When q(t) = 3, the system fails due to a hard failure and the degradation level is set to zero, i.e. x(t) = 0.
As shown in Fig 4, the initial health state of the system is state 1. The transition rates and reset maps of the SHS are defined as follows: ( l 23 q ð Þ :¼ where P 1 , P 2 , P 3 are calculated by In Eq (35), the reset maps ϕ 11 (q,x),ϕ 22 (q,x), model the dependency in assumption Eq (3) and the change of system degradation rate along with the transition from health state q = 1 to health state q = 2 model the dependency in assumption Eq (4).
Numerical calculation. A numerical example is conducted using the parameters in Table 2, taken from [24]. To solve the differential Eqs (40) and (41), the solver based on Runge Kutta method in Matlab R2013a is used. In [24], the original model is simulated by Monte Carlo to compute system reliability. A comparison is made between the results obtained by the doi:10.1371/journal.pone.0172680.t002 estimated lower bound provides a relatively conservative reliability estimation. Besides, the running time of Monte Carlo simulation is 1203.8 times more than that of the developed SHSbased approach.

Case 3
System descriptions. The third case study to demonstrate the developed framework is adapted from [25]. The failure behavior of bus tires is modeled by two dependent failure processes, i.e. soft failures caused by wear and hard failures of seven modes due to traumatic shocks. The soft failure is modeled by a continuous degradation process and the hard failure is modeled by a random shock process. Dependence exists among the two processes, that is, the probability that a traumatic shock occurs depends on the degradation level. Failures occur whenever one of the following two events happens: • the degradation process reaches its threshold, denoted by H; • a traumatic shock following a Cox process [47] with intensity λ(x) occurs.
Additional assumptions include: 1. The continuous degradation process is modeled by an SDE.
where w t 2R is a standard Wiener process, μ β ,σ β are constants and the initial degradation at t = 0 is denoted by x 0 .
2. Random shocks arrive according to a Cox process with intensity function λ(x) = δ + αx k ,k 2R, where δ and α are constants. In this paper, we let k = 1. Then, the intensity function is determined by SHS formulation. A SHS model is constructed in Fig 6 to describe the behavior of the system. The system has two health states, q(t)2{1,2} When q(t) = 1, the system is subject to the degradation process and degrades according to Eq (44). When q(t) = 2, the system fails due to a hard failure, and the degradation level is set to zero, i.e. x(t) = 0. where x 0 denotes the initial degradation magnitude and F(Á) represents the standard normal distribution function. A comparison is made between the results obtained by the developed methods in this paper and those obtained by solving Eqs (57) and (58) through Monte Carlo simulation. The sample size of the Monte Carlo simulation is 10 3 . In Fig 7(A)-7(C), the moments with order 0,1 and 2 are compared with those obtained by Eq (57). In Fig 7(D), the estimated reliability and its lower bound are compared to the results calculated by Eq (58). The comparisons show that the moments and system reliability are accurately predicted by the SHS model.

Conclusions
In this paper, a SHS-based modeling framework is developed for the reliability modeling and analysis of dependent failure processes, where degradation processes and random shock  processes compete to cause system failure and dependencies exist among these processes. In the developed model, the degradation process is modeled by SDEs and the shock process is characterized by transitions among the system health states. The dependencies among the two processes are modeled within the structure of the SHS model by the reset map, transition rates etc. The conditional moments for the state variables in the developed SHS model are calculated by deriving and solving a set of differential equations based on Dynkin's formula. Using these conditional moments, a reliability analysis method is developed to estimate the system reliability and its lower bound. Three case studies are conducted to demonstrate the developed methods. Comparisons to Monte Carlo simulations show that the developed method can achieve accurate reliability analysis results, while requiring much less computations than Monte Carlo simulations.
To apply the developed model in practice, the parameter values, such as the parameters in the SDEs that model the degradation processes, the transition rates that model the random shock processes, and the parameters in the reset maps that describe the dependency, need to be set based on historical data or expert judgments. Epistemic uncertainty might present when setting values for the parameters. Another source of epistemic (model) uncertainty is derived from the assumptions made for the present model. Treatment and calculation of epistemic uncertainty is an interesting problem that deserves further investigations.