System resilience distribution identification and analysis based on performance processes after disruptions

Yeqing Song; Ruiying Li

doi:10.1371/journal.pone.0276908

Abstract

Resilience is a system’s ability to withstand a disruption and return to a normal state quickly. It is a random variable due to the randomness of both the disruption and resilience behavior of a system. The distribution characteristics of resilience are the basis for resilience design and analysis, such as test sample size determination and assessment model selection. In this paper, we propose a systematic resilience distribution identification and analysis (RDIA) method based on a system’s performance processes after disruptions. Typical performance degradation/recovery processes have linear, exponential, and trigonometric functions, and they have three key parameters: the maximum performance degradation, the degradation duration, and the recovery duration. Using the Monte Carlo method, these three key parameters are first sampled according to their corresponding probability density functions. Combining the sample results with the given performance function type, the system performance curves after disruptions can be obtained. Then the sample resilience is computed using a deterministic resilience measure and the resilience distribution can be determined through candidate distribution identification, parameter estimation, and a goodness-of-fit test. Finally, we apply our RDIA method to systems with typical performance processes, and both the orthogonal experiment method and the control variable method are used to investigate the resilience distribution laws. The results show that the resilience of these systems follows the Weibull distribution. An end-to-end communication system is also used to explain how to apply this method with simulation or test data in practice.

Citation: Song Y, Li R (2022) System resilience distribution identification and analysis based on performance processes after disruptions. PLoS ONE 17(11): e0276908. https://doi.org/10.1371/journal.pone.0276908

Editor: Inés P. Mariño, Universidad Rey Juan Carlos, SPAIN

Received: April 12, 2022; Accepted: October 14, 2022; Published: November 3, 2022

Copyright: © 2022 Song, Li. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: This work was supported by the National Natural Science Foundation of China (61773044, https://www.nsfc.gov.cn/), and National Key Laboratory of Science and Technology on Reliability and Environmental Engineering (WDZC2019601A301). Both funds are received by R. L. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Ideally, systems are designed with the expectation that they will run smoothly and sustainably. However, a system often faces various disruptions including an external disruption (such as natural disaster and malicious attack) or the internal failure of the system itself. Disruption will reduce the system performance or cause system failure, or even a domino effect. Thus, a system should have the ability to withstand disruption and return to a normal state quickly, i.e., resilience, to prevent and minimize the losses from disruption.

The term “resilience” originates from the Latin word resiliere, which means to bounce back [1]. It was first proposed in ecology and was later applied to other fields such as communications systems and power systems. So far, the study of resilience has attracted great attention. Some scholars have concentrated on how to quantify system resilience [2–5], and some have discussed how to maintain and enhance system resilience [6–9]. Researchers also found that resilience was an internal property of a system, and claimed that it should be considered in the early design and development stages of systems [10, 11].

As a random variable due to the randomness of both the disruption and the system’s response to it, resilience distribution characteristics are the basis for system resilience design and analysis. These characteristics are useful in understanding system resilience from a statistical perspective, and they can be further used in resilience index determination, resilience test program design, and resilience assessment model selection. For different distribution types, the corresponding resilience index, test program and assessment model are different. For example, for system resilience with normal distribution and exponential distribution, when engineers determine the resilience index (e.g., the expected resilience), they will consider whether the resilience distribution is symmetric or skewed. In industry, for systems with similar resilience distributions, unified test and assessment specifications can be formulated for the convenience of engineers. As is known, for system characteristics with different types of distributions, the sample sizes and the assessment methods are very different.

However, to the best of our knowledge, few studies have analyzed resilience distribution, and only some research has provided resilience analysis results using distribution forms. For example, Ouyang et al. [12] calculated the resilience distribution of a power transmission grid for different hazard scenarios. Pant et al. [13] obtained the cumulative density function (CDF) of a container terminal’s resilience using a simulation for given recovery orders and recovery probability distributions. Ba-Alawi et al. [14] conducted a resilience assessment of the membrane bioreactor in a wastewater treatment plant based on the performance curves, with the assumption that the distribution of the failure data followed a lognormal distribution. Zinetullina et al. [15] proposed a quantitative resilience assessment for chemical process systems with normally distributed variables such as timing and precision. On the other hand, some researchers analyzed the resilience laws of complex networks such as transportation systems, water supply systems, and electric power systems. For example, Orosz et al. [16] explored the relationship between resilience and the minimal production flow rate of a process network. Mou et al. [17] found that the resilience of a crude oil transportation network decreased at a steady rate during random attacks and decreased sharply during deliberate attacks. They also found that the density and centrality of the network were negatively correlated with resilience, while the connectivity and size of the network were positively correlated with resilience. Table 1 summarized these related works, and one can find that how to analyze the system resilience distribution is still a problem.

Download:

Table 1. Comparison of related works.

https://doi.org/10.1371/journal.pone.0276908.t001

In this study, we first summarize the system’s possible behavior after a disruption and propose our resilience distribution identification and analysis (RDIA) method. Then, using this method, the resilience distribution for some typical performance processes is analyzed. The results show that the resilience of these systems follows the Weibull distribution.

Problem description

A system’s resilience is determined by its possible response after a disruption. To find the resilience distribution, we analyze the system’s response first.

After a disruption, the system may experience three main phases: i) degradation phase, ii) recovery phase, and iii) new steady phase, as shown in Fig 1. In the beginning, the system runs normally. Then, a disruption occurs at time t₀, and the system performance begins to degrade and enters the degradation phase. Then the recovery actions are taken to cope with the disruption and the system begins to recover and enters the recovery stage. The performance degradation time and the recovery time are denoted as T_d and T_r, respectively, and the maximum performance degradation is denoted as Q_L.

Download:

Fig 1. A typical performance process after a disruption.

https://doi.org/10.1371/journal.pone.0276908.g001

The system’s ability to absorb, adapt, and recover from the disruption determines its performance degradation and recovery process. Cimellaro et al. [18] proposed three types of performance recovery functions, i.e., linear, exponential, and trigonometric functions. Similarly, these functions were also used to describe the performance degradation processes (see [19–21]). For those systems that can be fully recovered, these functions are as follows:

Degradation function (from time t₀ to t₁) (1)
Recovery function (from time t₁ to t₂) (2)

where b represents the scale parameters in the exponential function and determines the extent to which the exponential function deviates from linearity (b > 0). The three types of performance process functions are shown in Fig 2, and they can be used in the following situations:

The linear function is suitable for systems with a constant performance change rate. It is generally used when there is limited information regarding emergency preparation, available resources, and the system’s response to the disruptions.
The exponential function can be applied for systems with a ‘first slow then fast’ performance degradation process or with a ‘first fast then slow’ recovery process. A larger b indicates a more significant deviation from the linear function and a more obvious change rate.
- An exponential degradation process indicates that the system can resist the disruption at the beginning, and the degradation increases over time because all possible resistance strategies have been used.
- An exponential recovery process implies that the system’s performance is restored quickly after the recovery action starts, and the speed slows down later. This phenomenon may be caused by a repair sequence in which those actions with a large recovery effect are taken first.
The trigonometric function is suitable for systems with slower performance degradation/recovery at both the beginning and the end of the process, but with a faster performance change rate in the middle.
- A trigonometric degradation process means that the disruption has a slight impact on the system in the initial stage, but its impact increases along with the disruption intensity. At the end of the process, the performance is not easily affected, and the degradation speed slows down.
- A trigonometric recovery process represents the fact that the system lacks or has limited resources at the beginning. Once the resource is obtained, the recovery speed increases quickly, as those actions that have a large recovery effect on the system are taken first. Therefore, the recovery speed increases at the beginning and then slows down slightly at the end.

Download:

Fig 2. Typical system performance process functions.

https://doi.org/10.1371/journal.pone.0276908.g002

There are three key parameters in the performance degradation and recovery process after a disruption, including the maximum performance degradation Q_L, the degradation duration T_d, and the recovery duration T_r. Usually, Q_L is determined by the system redundancy and the disruption severity, and T_d and T_r are affected by the recovery strategies such as resource allocation and repair sequences. Researchers have also analyzed these parameter characteristics. For example, Wang et al. [22] stated that both the instant residual availability and the recovery time of a link in a road network had a confirmed half-normal distribution when a link was randomly disrupted. Ouyang et al. [12] used a power law distribution to model the hazard intensity of a power transmission system in Harris County and applied the normal distribution to model the repair time of failed substations. An investigation made by Carreras et al. [23] showed that the blackout time intervals for power systems approximately satisfied an exponential distribution. Weiss and Rosenthal [24] compared the durations of the supply disruption of an economic order quantity inventory system with a normal distribution and an exponential distribution. Upadhya and Srinivasan [25] applied an exponential distribution for a fighter aircraft’s repair time and log-normal distribution for the logistic delays. Myrefelt [26] found that log-normal distributions best fit the mean time to repair (MTTR) and mean time between failures (MTBF) data for a heating, ventilation, and air conditioning system.

Method

For the system with the given system response characteristics, we propose a Resilience Distribution Identification and Analysis (RDIA) method, and analyze the system resilience distributions. Our RDIA method makes the following assumptions:

The system can finally return to the initial performance after recovery;
Both the system performance degradation and the recovery processes are monotonic.

Fig 3 expresses the steps of the RDIA method. The specific steps are as follows:

Download:

Fig 3. Flowchart of the RDIA method.

https://doi.org/10.1371/journal.pone.0276908.g003

Resilience samples obtaining

To accurately identify the resilience distribution, the resilience data should be sufficient. In the Monte Carlo simulation, it is assumed that the simulation error is required to be no more than ±ε at a confidence level of 1 − α. According to the central limit theorem, the number of simulations n should satisfy the following equation (see [25, 27]): (3)

If the variance σ² is unknown, the sample variance S can be taken as an unbiased estimate of the variance. In general, it is recommended that n ≥ 1000.

The Monte Carlo method, which uses “frequency” to approximate “probability” with enough experimental data, is applied to obtain the performance behavior samples after the disruption. The steps are as follows:

Sampling the key parameters of the performance process after disruptions: n sets of the three key parameters of the system (i.e., T_d, T_r, and Q_L) are sampled according to their distributions.
Determining the resilience curve: The resilience curve is generated using the key parameters sampled above and the performance degradation/recovery functions. Then n performance processes after disruptions are determined.
Obtaining the resilience samples: n resilience values are calculated using the resilience measure in Eq 4 (see [27]), i.e., (4) where Q(t) and Q₀(t) represent the system performance at time t with and without disruption, respectively, t₀ is the disruption occurrence time, and T_a is the maximum allowable recovery time. This measurement reflects the system’s average performance after a disruption within a certain time period.

Probability distribution fitting and testing

After obtaining the resilience samples, the probability distribution fitting and testing have the following three steps [28]:

Candidate distribution identification.

To construct the resilience sample distribution histogram, we first calculate the number of groups as: (5) where k is the number of groups, [⋅] indicates rounding, and n is the sample size. Then we combine those groups with a sample size of less than five and construct the resilience sample distribution histogram. If the histogram is symmetrically or approximately symmetrically distributed, the resilience samples may obey a normal distribution or a Weibull distribution with a shape parameter between 3 and 4. If the data are right-skewed, an exponential distribution, log-normal distribution, and Weibull distribution should be considered [29]. After possible theoretical distributions are selected, the probability plot can further be used for verification, and the distribution with the best fitting effect (i.e., the points in the probability plot lie approximately on a line) can be selected.

Parameter estimation.

The methods of parameter estimation mainly include the moment estimation and the maximum likelihood estimation (MLE). The estimation results obtained with the MLE are more significant if the population distribution is known since the posterior information of the samples can be fully utilized. In this study, the MLE is used for the resilience parameter estimation. Letting the distribution of the resilience population be , where θ_i is an unknown parameter, and are the observed resilience sample values obtained by the Monte Carlo simulation, then, the maximum likelihood function can be written as . Taking the logarithm, we have ln[L(θ₁, θ₂, . Since the obtained resilience samples are continuous random variables, the maximum likelihood functions are differentiable. Then the likelihood equations can be established as . By solving this, we can obtain the estimated parameters (θ₁, θ₂, …, θ_m).

Goodness-of-fit test.

Considering the diversity of the resilience sample distributions and the large sample data obtained with the Monte Carlo method, in this study, the chi-squared test method is applied for the goodness-of-fit test. This method uses chi-squared statistics χ² (see Eq 6) to represent the deviations between the observed values and the expected distribution. According to the large number theorem, when the samples obey a certain distribution, the number of samples in each group should be close to the number calculated using the theoretical distribution. Therefore, the smaller the chi-squared statistics is, the smaller the deviation is. The chi-squared statistics can be calculated as follows: (6) where k is the number of groups, O_i is the number of samples in group i, n is the sample size, p_i is the probability that sample X_i belongs to group i if the distribution assumption holds, p_i = F(x_i) − F(x_i−1), and F(x) is the CDF.

The maximum likelihood estimation is used to replace the unknown parameters in the resilience distribution. The critical value follows the chi-squared distribution with the degree of freedom k − m − 1, where m is the number of unknown parameters and α is the significance level. According to the given significance level α and the corresponding quantile of the chi-squared distribution, we can find the critical value in the chi-squared table. Then, this value can be compared with the chi-squared statistics calculated using Eq 6. If the statistics are less than this critical value, it is considered that the sample observation values obey the assumed distribution, i.e., the resilience distribution is obtained. Otherwise, new distribution types should be considered based on other candidate distributions.

Experiments and discussions

Using our RDIA method described above, this section discusses the investigation of the resilience distribution of systems with typical performance processes after disruptions. In this study, it is given that:

Both the system performance degradation and the recovery processes follow linear, exponential, or triangular functions as Eqs 1 and 2 state.
Both the degradation duration T_d and the recovery duration T_r of the system performance follow negative exponential distributions.
The maximum performance degradation Q_L follows a discrete uniform distribution.
The maximum allowable recovery time T_a is 1000 s.

Resilience distribution identification and analysis

Experiment design.

To investigate the resilience distribution of systems with different performance degradation/recovery process functions and parameters, we design five three-level factors experiments according to the L18(3⁷) orthogonal array, as shown in Table 2. The performance degradation duration T_d and recovery duration T_r obey the negative exponential distribution with the mean values of 1/λ = 30, 60, 90 seconds. The maximum performance degradation obeys discrete uniform distributions with for i = 1, 2, …, max(Q_L) and max(Q_L) = 30%, 60%, 90%. To facilitate the research, b is assumed to be ln(200) in the exponential function.

Download:

Table 2. Experimental design.

https://doi.org/10.1371/journal.pone.0276908.t002

Procedure and results.

Experiment 1 in Table 2 is taken as an example to illustrate the RDIA process and results. Letting the number of simulations n be 5000, according to Eq 3, we can compute the simulation error as ε ≤ 0.08% with a confidence level of 1 − α = 95%. The histogram of the resilience samples obtained from the simulation is shown in Fig 4(a). Using the distribution fitting method, one can find that the system resilience obeys a Weibull distribution with the shape parameter ξ = −0.99988, scale parameter σ = 0.004564, and position parameter μ = 0.995432. From the empirical and theoretical CDF plots in Fig 4(b) and the probability plot in Fig 4(c), one can see that the empirical distribution curve of the resilience samples obtained in Experiment 1 is highly consistent with the fitting distribution curve, and only a slight deviation exists in some samples with low system resilience. Using a chi-squared test, the chi-squared statistic is computed to be χ² = 9.3021. It is known that the critical value . Since , we can conclude that the system resilience with the process functions and parameters in Experiment 1 obeys the Weibull distribution at a significance level of 5%.

Download:

Fig 4. Statistics results for Experiments 1 in Table 2.

a) histogram; b) CDF; c) probability plot (the fitted Weibull distribution is used as the theoretical function).

https://doi.org/10.1371/journal.pone.0276908.g004

We also find that all 18 experiments in Table 2 follow the Weibull distribution. The parameter estimation and the chi-squared test results of these 18 experiments are shown in Table 3. One can see that 13 chi-squared statistics in Table 3 satisfy a significance level α = 5%, 4 statistics satisfy α = 1%, and 1 statistic satisfies α = 0.5%.

Download:

Table 3. Distribution fitting results for experiments in Table 2.

https://doi.org/10.1371/journal.pone.0276908.t003

Effects of performance process functions and parameters

Experiment design.

Using the control variable method, we design 13 types of experiments to investigate the influence of the system performance process parameters on the resilience distribution. The specific parameters are shown in Table 4. Experiment Type 1 in Table 4 is the control group, and the other experiment types are experimental groups. Comparing the experimental results for Experiment Types 2–5, Experiment Types 6–9, and Experiment Types 10–13 with Experiment Type 1, we can investigate how the random variable T_r, T_d, and Q_L affect the resilience distribution, respectively.

Download:

Table 4. Experiments to find how the performance process parameters affect the resilience distribution.

https://doi.org/10.1371/journal.pone.0276908.t004

Combining the 13 types of performance process parameters in Table 4 with the 3 system performance process functions in Eqs 1 and 2, we have 3 × 13 experiments. We use L/E/T to represent the performance processes that follow the linear/exponential/trigono -metric functions, respectively. We use a letter and a digit together to represent the experiment code, e.g., Experiment L1 is a performance process with a linear function and Type 1 parameters in Table 4. For the experiments with exponential functions in Table 4, letting b be ln(200), the experiments in Table 5 are added to discuss how the parameter b of the exponential function affects the system resilience distribution. Experiment E1 is used as the control group, and the other four experiments are experimental groups.

Download:

Table 5. Experiments to find the effect of b on the resilience distribution.

https://doi.org/10.1371/journal.pone.0276908.t005

Discussion.

The procedures for the experiments in Tables 4 and 5 are similar to those in Table 2, so we only discuss the experimental results here. Fig 5 provides the box plots of the system resilience with a parameter change.

Download:

Fig 5. Sample box plots of experiments in Tables 4 and 5.

a) effect of performance process functions; b) effect of T_d; c) effect of T_r; d) effect of Q_L. The yellow dots represent the mean resilience , the red lines represent the median resilience , the tops and the bottoms of the boxes represent the upper 75% percentiles and lower 25% percentiles , respectively, and the top and bottom lines represent the maximum resilience and minimum resilience , respectively.

https://doi.org/10.1371/journal.pone.0276908.g005

From Fig 5, one can see that the medians of all the boxes are generally greater than the mean, and this indicates that the resilience distributions are right-skewed. This is because the system can return to the normal state within the maximum allowable recovery time T_a in most situations. It can also be found that the maximum resilience of all experiments is very close to 1. This is because the number of Monte Carlo iterations is large enough in each experiment, and there are always some processes with small performance degradations and short degradation/recovery durations. The results show that all the samples of the experiments in Tables 4 and 5 obey the Weibull distribution. The impacts of the performance process functions and parameters on the system resilience distribution are analyzed below:

Effect of performance process functions. According to Fig 5(a), the process functions affect the system resilience distributions. Fig 6 shows the sample resilience CDFs for all three types of resilience functions. Comparing the exponential resilience functions with different b, one can see that the CDF curve becomes steeper and most of the resilience samples are higher with a larger b. When b becomes closer to 0, the sample distribution becomes closer to those with linear and trigonometric resilience functions. This is because a small b indicates a small change rate of the exponential performance process, and in this case, the exponential function is close to the linear function.
The parameters of the Weibull distributions obtained through distribution fitting for the experiments in Table 5 are shown in Fig 7. One can see that the shape parameter ξ does not change obviously with b, but a smaller b results in a smaller scale parameter σ, a larger position parameter μ, and a less resilient system.
Fig 8 compares the parameters estimated by fitting the Weibull distribution for the experiments in Table 4. One can see that the scale parameter σ and the position parameter μ are almost the same for experiments with linear and trigonometric performance process functions, and their shape parameters ξ are all near -1. Although the linear and trigonometric functions are different in curve shapes, their integrations within the same time period are similar, as shown in Fig 2. Moreover, the scale parameter σ of the experiments with exponential performance process functions is smaller than those for the other two types of functions, and the position parameter μ is larger than those for the other two types.
Effect of T_d and T_r. Fig 5(b) and 5(c) shows the effect of both the degradation duration and the recovery duration on the system resilience distribution. One can see that the samples are more dispersed for the exponentially distributed T_r (or T_d) with larger 1/λ. The system resilience decreases with the increase in the T_r (or T_d) mean. This is because the longer the T_r (or T_d) is, the longer the system operates at a low-performance level, resulting in lower resilience. The effects of T_r and T_d are almost the same since they both determine the duration of the system operating with degraded performance.
The experiments with linear resilience functions are taken as an example. Their shape parameters ξ do not have significant laws, and their scale parameters σ and position parameters μ are shown in Fig 9. One can see that a larger T_r (or T_d), a larger σ and a smaller μ results in a resilience distribution with a wider range.
Fig 10 shows the sample CDF of Experiments L1-L9. It can be seen that the effects of T_d and T_r on the resilience distributions are basically the same. This is because both T_d and T_r follow exponential distributions with the same parameter, and they jointly determine the time when the system operates at a degraded performance.
Effect of Q_L. Fig 5(d) explores the effect of the maximum degradation performance Q_L on the system resilience distribution. One can see that the mean resilience increases with the decrease in the max(Q_L), and the sample resilience is more concentrated with a smaller max(Q_L). This phenomenon is inevitable because the maximum performance degradation Q_L directly determines the maximum performance that the system loses. Therefore, a smaller max(Q_L) implies a larger system resilience. Taking experiments with linear resilience functions as an example, their estimated parameters are shown in Fig 11. One can see that a larger max(Q_L) has a larger scale parameter σ and a smaller position parameter μ, indicating a lower system resilience.

Download:

Fig 6. Sample CDFs of experiments in Tables 4 and 5.

https://doi.org/10.1371/journal.pone.0276908.g006

Download:

Fig 7. Parameters estimated of experiments in Table 5.

https://doi.org/10.1371/journal.pone.0276908.g007

Download:

Fig 8. Parameters estimated of experiments in Table 4.

https://doi.org/10.1371/journal.pone.0276908.g008

Download:

Fig 9. Estimated parameters of Experiments L1-L9.

https://doi.org/10.1371/journal.pone.0276908.g009

Download:

Fig 10. Sample CDFs of Experiments L1-L9.

https://doi.org/10.1371/journal.pone.0276908.g010

Download:

Fig 11. Estimated parameters for Experiments L1 and L10-L13.

https://doi.org/10.1371/journal.pone.0276908.g011

Case study

Here, a wireless end-to-end communication system under random electromagnetic interference is used as an example to explain our RDIA method with simulation or test data in practice. In this case, the bit error rate (BER) is chosen as the key performance index, and the maximum allowable recovery time is 12 minutes.

The system consists of two fixed nodes, a transmitting one and a receiving one, and the distance between them is 1km. The transmitting node generates 1024-bit packets at a rate of 1.0 packets/second, and transmits them at 1024 bits/second over the channel. A jammer is applied to simulate the electromagnetic interference. It moves in a straight line at a constant speed within a 4km × 8km area as Fig 12 shows. Its speed follows a normal distribution, and its start and end coordinates follow uniform distributions. The parameters are shown in Table 6.

Download:

Table 6. Parameters of the jammer movement.

https://doi.org/10.1371/journal.pone.0276908.t006

Download:

Fig 12. The end-to-end communication system under random electromagnetic interference.

https://doi.org/10.1371/journal.pone.0276908.g012

Let the simulation error be ε ≤ 0.08 with a confidence level of 1 − α = 95%, and the number of simulation iterations n is determined as 50 according to Eq 3. As it is not a numerical study, so we relax the requirements of simulation error. In each simulation run, we randomly generate the jammer trajectory according to the parameters shown in Table 6. The system BER is collected every 0.5 s during the simulation, and the system resilience of these samples are calculated using Eq 4. Fig 13 shows the BER under the electromagnetic interference shown in Fig 12. As the distance between the jammer and the receiving node is first far, then near, and later far, the system BER declines first, then rises, and ends up at a degraded level.

Download:

Fig 13. BER under the electromagnetic interference shown in Fig 12.

https://doi.org/10.1371/journal.pone.0276908.g013

Since the BER is a smaller-the-better type parameter, we use the min-max normalization method to obtain the normalized performance data, and then smoothed it. Fig 14 shows the results obtained after the BER data in Fig 13 be normalized and smoothed. After calculating the system resilience under each electromagnetic interference, we fit the data using a generalized extreme value distribution with the shape parameter ξ = −1.012, scale parameter σ = 0.09541 and location parameter μ = 0.9056. The PDF and CDF of both the sample data and the fitting results can be seen in Fig 15. The chi-squared statistic is computed to be χ² = 4.595. Comparing with the critical value at 95% confidence level, we can draw the conclusion that the system resilience can be considered to obey the generalized extreme value distribution.

Download:

Fig 14. Normalized and smoothed performance under the electromagnetic interference shown in Fig 12.

https://doi.org/10.1371/journal.pone.0276908.g014

Download:

Fig 15. PDF and CDF of system resilience.

a) PDF; b) CDF.

https://doi.org/10.1371/journal.pone.0276908.g015

Conclusion

Resilience reflects the ability of a system to withstand disruptions and quickly recover from them. It is an internal attribute of a system, as well as a random variable due to the randomness of both the disruption and the system’s response to it. To assist with system resilience design and analysis, this paper proposes a resilience distribution identification and analysis (RDIA) method to find the resilience distribution from a statistical perspective. Based on the functions and key parameters of the system’s performance processes after disruptions, this method uses the Monte Carlo method to obtain the resilience samples. Then the system resilience distribution can be determined using distribution identification, parameter estimation, and a goodness-of-fit test. Finally, the resilience distributions are analyzed for system resilience with typical performance processes. The results show that the resilience obeys the Weibull distribution. Our method needs system’s performance degradation/recovery function and related parameter distributions. In practice, it is not always easy to obtain such data. In this situation, we can try to collect several sets of system’s performance data that varies with time after disruptions. The case study shows how to inject disruptions and collect performance response data for system resilience distribution identification. Further, if no data can be used, classical resilience distribution types for typical performance processes can be used, and then the distribution type can be updated with data collection.

The contributions of our paper include the following:

A systematic method is proposed to identify the system resilience distribution based on the system performance processes after disruptions.
The resilience distributions are analyzed for typical performance degradation/recovery processes with linear, exponential, and trigonometric functions, as well as exponential distributed durations and discrete uniform distributed maximum performance degradation. The results show that the resilience obeys the Weibull distribution.
Our method aids understanding of system resilience from a statistical perspective, and the resilience distribution obtained can be further used for system resilience design and analysis.

Using our RDIA method, system resilience distribution can be obtained, and future studies will focus on more possible performance process types.

Supporting information

S1 Data. Experiments data files.

https://doi.org/10.1371/journal.pone.0276908.s001

(ZIP)

References

1. Henry D, Ramirez-Marquez JE. Generic metrics and quantitative approaches for system resilience as a function of time. Reliability Engineering & System Safety. 2012;99:114–122.
- View Article
- Google Scholar
2. Bruneau M, Chang SE, Eguchi RT, Lee GC, O’Rourke TD, Reinhorn AM, et al. A framework to quantitatively assess and enhance the seismic resilience of communities. Earthquake Spectra. 2003;19(4):733–752.
- View Article
- Google Scholar
3. Zobel CW. Representing perceived tradeoffs in defining disaster resilience. Decision Support Systems. 2011;50(2):394–403.
- View Article
- Google Scholar
4. Li R, Dong Q, Jin C, Kang R. A new resilience measure for supply chain networks. Sustainability. 2017;9:144.
- View Article
- Google Scholar
5. Guo J, Li Y, Yang Z, Zhu X. Quantitative method for resilience assessment framework of airport network during COVID-19. PLoS ONE. 2021;16(12):e0260940. pmid:34860845
- View Article
- PubMed/NCBI
- Google Scholar
6. Zhang C, Kong J, Simonovic SP. Modeling joint restoration strategies for interdependent infrastructure systems. PLoS ONE. 2018;13(4):e0195727. pmid:29649300
- View Article
- PubMed/NCBI
- Google Scholar
7. Ulusan A, Ergun O. Restoration of services in disrupted infrastructure systems: A network science approach. PLoS ONE. 2018;13(2):e0192272. pmid:29444191
- View Article
- PubMed/NCBI
- Google Scholar
8. Jufri FH, Widiputra V, Jung J. State-of-the-art review on power grid resilience to extreme weather events: Definitions, frameworks, quantitative assessment methodologies, and enhancement strategies. Applied Energy. 2019;239:1049–1065.
- View Article
- Google Scholar
9. Pasman H, Kottawar K, Jain P. Resilience of process plant: what, why, and how resilience can improve safety and sustainability. Sustainability. 2020;12(15):6152.
- View Article
- Google Scholar
10. Liu W, Song Z. Review of studies on the resilience of urban critical infrastructure networks. Reliability Engineering & System Safety. 2020;193:106617.
- View Article
- Google Scholar
11. Pawar B, Park S, Hu P, Wang Q. Applications of resilience engineering principles in different fields with a focus on industrial systems: A literature review. Journal of Loss Prevention in the Process Industries. 2021;69:104366.
- View Article
- Google Scholar
12. Ouyang M, Duen̂as-Osorio L, Min X. A three-stage resilience analysis framework for urban infrastructure systems. Structural Safety. 2012;36:23–31.
- View Article
- Google Scholar
13. Pant R, Barker K, Ramirez-Marquez JE, Rocco CM. Stochastic measures of resilience and their application to container terminals. Computers & Industrial Engineering. 2014;70:183–194.
- View Article
- Google Scholar
14. Ba-Alawi AH, Ifaei P, Li Q, Nam K, Djeddou M, Yoo C. Process assessment of a full-scale wastewater treatment plant using reliability, resilience, and econo-socio-environmental analyses (R2ESE). Process Safety and Environmental Protection. 2020;133:259–274.
- View Article
- Google Scholar
15. Zinetullina A, Yang M, Khakzad N, Golman B, Li X. Quantitative resilience assessment of chemical process systems using functional resonance analysis method and Dynamic Bayesian network. Reliability Engineering & System Safety. 2021;205:107232.
- View Article
- Google Scholar
16. Orosz A, Pimentel J, Friedler F. General formulation for the resilience of processing systems. Chemical Engineering Transactions. 2020;81:859–864.
- View Article
- Google Scholar
17. Mou N, Sun S, Yang T, Wang Z, Zheng Y, Chen J, et al. Assessment of the resilience of a complex network for crude oil transportation on the maritime silk road. IEEE Access. 2020;8:181311–181325.
- View Article
- Google Scholar
18. Cimellaro GP, Reinhorn AM, Bruneau M. Seismic resilience of a hospital system. Structure and Infrastructure Engineering. 2010;6(1-2):127–144.
- View Article
- Google Scholar
19. Crk V. Reliability assessment from degradation data. In: Annual Reliability and Maintainability Symposium; 2000. p. 155–161.
20. Gebraeel N. Sensory-updated residual life distributions for components with exponential degradation patterns. IEEE Transactions on Automation Science and Engineering. 2006;3(4):382–393.
- View Article
- Google Scholar
21. Javed K, Gouriveau R, Zerhouni N, Nectoux P. A feature extraction procedure based on trigonometric functions and cumulative descriptors to enhance prognostics modeling. In: Proceedings of IEEE International Conference on Prognostics and Health Management; 2013. p. 1–7.
22. Wang Y, Fu S, Wu B, Huang J, Wei X. Towards optimal recovery scheduling for dynamic resilience of networked infrastructure. Journal of Systems Engineering and Electronics. 2018;29(5):995–1008.
- View Article
- Google Scholar
23. Carreras BA, Newman DE, Dobson I, Poole AB. Evidence for self-organized criticality in a time series of electric power system blackouts. IEEE Transactions on Circuits and Systems. 2004;51(9):1733–1740.
- View Article
- Google Scholar
24. Weiss HJ, Rosenthal EC. Optimal ordering policies when anticipating a disruption in supply or demand. European Journal of Operational Research. 1992;59(3):370–486.
- View Article
- Google Scholar
25. Upadhya KS, Srinivasan NK. Availability of weapon systems with multiple failures and logistic delays. International Journal of Quality & Reliability Management. 2003;20(7):836–846.
- View Article
- Google Scholar
26. Myrefelt S. The reliability and availability of heating, ventilation and air conditioning systems. Energy and Buildings. 2004;36(10):1035–1048.
- View Article
- Google Scholar
27. Li R, Tian X, Yu L, Kang R. A systematic disturbance analysis method for resilience evaluation: A case study in material handling systems. Sustainability. 2019;11(5):1447.
- View Article
- Google Scholar
28. Ebeling CE. An introduction to reliability and maintainability engineering. Tata McGraw-Hill Education; 2004.
29. Zhang D. MATLAB probability and mathematical statistics analysis. Mechanical Industry Press; 2010.

[ref1] 1. Henry D, Ramirez-Marquez JE. Generic metrics and quantitative approaches for system resilience as a function of time. Reliability Engineering & System Safety. 2012;99:114–122.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Bruneau M, Chang SE, Eguchi RT, Lee GC, O’Rourke TD, Reinhorn AM, et al. A framework to quantitatively assess and enhance the seismic resilience of communities. Earthquake Spectra. 2003;19(4):733–752.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Zobel CW. Representing perceived tradeoffs in defining disaster resilience. Decision Support Systems. 2011;50(2):394–403.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Li R, Dong Q, Jin C, Kang R. A new resilience measure for supply chain networks. Sustainability. 2017;9:144.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Guo J, Li Y, Yang Z, Zhu X. Quantitative method for resilience assessment framework of airport network during COVID-19. PLoS ONE. 2021;16(12):e0260940. pmid:34860845
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Zhang C, Kong J, Simonovic SP. Modeling joint restoration strategies for interdependent infrastructure systems. PLoS ONE. 2018;13(4):e0195727. pmid:29649300
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref7] 7. Ulusan A, Ergun O. Restoration of services in disrupted infrastructure systems: A network science approach. PLoS ONE. 2018;13(2):e0192272. pmid:29444191
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Jufri FH, Widiputra V, Jung J. State-of-the-art review on power grid resilience to extreme weather events: Definitions, frameworks, quantitative assessment methodologies, and enhancement strategies. Applied Energy. 2019;239:1049–1065.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref9] 9. Pasman H, Kottawar K, Jain P. Resilience of process plant: what, why, and how resilience can improve safety and sustainability. Sustainability. 2020;12(15):6152.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref10] 10. Liu W, Song Z. Review of studies on the resilience of urban critical infrastructure networks. Reliability Engineering & System Safety. 2020;193:106617.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref11] 11. Pawar B, Park S, Hu P, Wang Q. Applications of resilience engineering principles in different fields with a focus on industrial systems: A literature review. Journal of Loss Prevention in the Process Industries. 2021;69:104366.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref12] 12. Ouyang M, Duen̂as-Osorio L, Min X. A three-stage resilience analysis framework for urban infrastructure systems. Structural Safety. 2012;36:23–31.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Pant R, Barker K, Ramirez-Marquez JE, Rocco CM. Stochastic measures of resilience and their application to container terminals. Computers & Industrial Engineering. 2014;70:183–194.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref14] 14. Ba-Alawi AH, Ifaei P, Li Q, Nam K, Djeddou M, Yoo C. Process assessment of a full-scale wastewater treatment plant using reliability, resilience, and econo-socio-environmental analyses (R2ESE). Process Safety and Environmental Protection. 2020;133:259–274.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Zinetullina A, Yang M, Khakzad N, Golman B, Li X. Quantitative resilience assessment of chemical process systems using functional resonance analysis method and Dynamic Bayesian network. Reliability Engineering & System Safety. 2021;205:107232.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Orosz A, Pimentel J, Friedler F. General formulation for the resilience of processing systems. Chemical Engineering Transactions. 2020;81:859–864.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Mou N, Sun S, Yang T, Wang Z, Zheng Y, Chen J, et al. Assessment of the resilience of a complex network for crude oil transportation on the maritime silk road. IEEE Access. 2020;8:181311–181325.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Cimellaro GP, Reinhorn AM, Bruneau M. Seismic resilience of a hospital system. Structure and Infrastructure Engineering. 2010;6(1-2):127–144.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref19] 19. Crk V. Reliability assessment from degradation data. In: Annual Reliability and Maintainability Symposium; 2000. p. 155–161.

[ref20] 20. Gebraeel N. Sensory-updated residual life distributions for components with exponential degradation patterns. IEEE Transactions on Automation Science and Engineering. 2006;3(4):382–393.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref21] 21. Javed K, Gouriveau R, Zerhouni N, Nectoux P. A feature extraction procedure based on trigonometric functions and cumulative descriptors to enhance prognostics modeling. In: Proceedings of IEEE International Conference on Prognostics and Health Management; 2013. p. 1–7.

[ref22] 22. Wang Y, Fu S, Wu B, Huang J, Wei X. Towards optimal recovery scheduling for dynamic resilience of networked infrastructure. Journal of Systems Engineering and Electronics. 2018;29(5):995–1008.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref23] 23. Carreras BA, Newman DE, Dobson I, Poole AB. Evidence for self-organized criticality in a time series of electric power system blackouts. IEEE Transactions on Circuits and Systems. 2004;51(9):1733–1740.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref24] 24. Weiss HJ, Rosenthal EC. Optimal ordering policies when anticipating a disruption in supply or demand. European Journal of Operational Research. 1992;59(3):370–486.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref25] 25. Upadhya KS, Srinivasan NK. Availability of weapon systems with multiple failures and logistic delays. International Journal of Quality & Reliability Management. 2003;20(7):836–846.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref26] 26. Myrefelt S. The reliability and availability of heating, ventilation and air conditioning systems. Energy and Buildings. 2004;36(10):1035–1048.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref27] 27. Li R, Tian X, Yu L, Kang R. A systematic disturbance analysis method for resilience evaluation: A case study in material handling systems. Sustainability. 2019;11(5):1447.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref28] 28. Ebeling CE. An introduction to reliability and maintainability engineering. Tata McGraw-Hill Education; 2004.

[ref29] 29. Zhang D. MATLAB probability and mathematical statistics analysis. Mechanical Industry Press; 2010.

Figures

Abstract

Introduction

Problem description

Method

Resilience samples obtaining

Probability distribution fitting and testing

Candidate distribution identification.

Parameter estimation.

Goodness-of-fit test.

Experiments and discussions

Resilience distribution identification and analysis

Experiment design.

Procedure and results.

Effects of performance process functions and parameters

Experiment design.

Discussion.

Case study

Conclusion

Supporting information

S1 Data. Experiments data files.

References