Embedding resilience in the design of the electricity supply for industrial clients

This paper proposes an optimization model, using Mixed-Integer Linear Programming (MILP), to support decisions related to making investments in the design of power grids serving industrial clients that experience interruptions to their energy supply due to disruptive events. In this approach, by considering the probabilities of the occurrence of a set of such disruptive events, the model is used to minimize the overall expected cost by determining an optimal strategy involving pre- and post-event actions. The pre-event actions, which are considered during the design phase, evaluate the resilience capacity (absorption, adaptation and restoration) and are tailored to the context of industrial clients dependent on a power grid. Four cases are analysed to explore the results of different probabilities of the occurrence of disruptions. Moreover, two scenarios, in which the probability of occurrence is lowest but the consequences are most serious, are selected to illustrate the model’s applicability. The results indicate that investments in pre-event actions, if implemented, can enhance the resilience of power grids serving industrial clients because the impacts of disruptions either are experienced only for a short time period or are completely avoided.


Introduction
Systems such as those for the distribution of electricity, water, oil, material supplies, and electronic communications correspond to Critical Infrastructures (CIs) by providing fundamental services to the economy and the routine operation of society. Many elements of CIs take the form of networks [1], with dependency among nodes and links, which in turn are usually interconnected with other networks. The efficiency of an entire CI depends on the availability of each element [2]; therefore, the occurrence of undesired and unexpected events, such as natural disasters, bad weather or a combination of other factors, can cause adverse and extended effects on the system, leading to social, environmental and economic impacts, although the probability of such events is usually low [3,4]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 individual scenarios are analysed, demonstrating how the model can be applied to propose an appropriate resilience-based strategy for a specific situation.
Sensitivity analysis is also conducted to evaluate the impact of financial constraints for design investments compared to the overall performance of the power grid and the overall cost. The results demonstrate that higher investments during the design phase, when optimally allocated, have the potential to improve power grid performance and still reduce overall costs.
The remainder of the paper is organized as follows. Section 2 presents the theoretical background on resilience, including different approaches, applications and comparisons with other concepts. Section 2 also introduces useful concepts about EPSNs and their state of the art in the context of resilience. Section 3 shows the characteristics of the EPSN considered and the formulation of the proposed optimization model. Section 4 discusses examples to illustrate the applicability of the model. Finally, Section 5 concludes with remarks.

Theoretical background The concept of resilience
Resilience assessment requires information about the disruptive events which an entity might be exposed to, such as their likelihood and their expected Impact on the System (IS), enabling the estimation of the resources necessary to bring the system back into operation. IS corresponds to the reduction of the system's ability to perform an assigned function after the occurrence of disruptive events. Given this information, the system's performance should return to its targeted level over time, incurring a Post-interruption Recovery Cost (PCR). In this paper, both IS and PCR are measured as expected costs, weighted with the likelihoods of the disruptive events considered.
The concept of resilience is concerned with the resistance, flexibility and recovery of an entity [5], emphasizing that actions can be undertaken to mitigate IS. A resilient system is defined by the following capabilities: (i) absorption-the capacity to anticipate, minimize and withstand the consequences of disturbances; (ii) adaptation-the capacity for reconfiguration in undesirable situations; and (iii) restoration-the speed and ease with which the system returns to normal operation [5,[19][20][21]. These three capacities make up the "resilience triangle" [5] and should ideally be considered during the design phase of a system to effectively mitigate IS.
This paper focuses on setting a resilience-based strategy that determines the appropriate pre-event actions that have the potential to minimize IS by considering the capacities for resilience previously presented. The investments associated with these three capacities can be defined as Investments in Design for Resilience (IDR), comprising actions undertaken during the system's design phase that seek to reduce both the impact and the system recovery time, as represented by Fig 1. As seen in Fig 1, the system designed to absorb and anticipate the impact of an unwanted event and to adapt to new conditions might have a low IS, and thus should be more resilient. In addition, the recovery speed is influenced by the investments to return the system to operation quickly. Thus, this paper aims to demonstrate the important interactions between IDR and IS decisions, in which IDR could positively influence system resilience by increasing absorption and adaption capacities, shortening recovery time and consequently reducing IS.
can perform its required function at a given point of time under a given set of conditions. Traditional risk assessment in turn focuses on the likelihood and consequences of disruptive events, by understanding the nature of potential disturbances, characterizing their negative consequences and mitigating the level of risk which the system is exposed to (e.g., [25,27]). Robustness or vulnerability are often used to measure the extent to which a power grid has high or low reliability [29].
According to Linkov [10], "resilience is not a substitute for principled system design or risk management. Instead, resilience is a complementary attribute that uses strategies of adaptation and mitigation to improve traditional risk management". Panteli and Mancarella [34] in turn argued that the resilience concept encompasses all of the aforementioned concepts. Indeed, because risk assessment results in an understanding and mitigation of the potential disturbances, and robustness/vulnerability evaluation can help to identify weaknesses and candidates for the implementation of actions of resilience enhancement, these two approaches can serve as inputs to resilience analysis during the CI's design phase. In contrast, reliability assessment can measure the effectiveness of a resilience-based strategy over time.
In the field of EPSN resilience, there have been papers in the literature with both qualitative [34,[49][50][51] and quantitative [5,[52][53][54][55][56] approaches. For example, Panteli et al. [34] evaluated the impact of weather changes on the reliability, operation and resilience of an electric power network by observing the intensity, frequency and duration of severe weather events and proposing plans to increase EPSN resilience. Ouyang et al. [53] used a probabilistic modelling approach to quantify electrical system resilience and economic losses, given the occurrence of hurricanes, assessing (i) hurricane risk, (ii) fragility, (iii) performance and (iv) restoration. Kim et al. [54] investigated the topological properties of the South Korean Power Grid (KPG), including its resilience. Their study considered node-based and network-based measures to characterize the structural dimensions of a network and to understand its topology and resilience. The results obtained concerning the KPG were compared with random and scale-free reference networks. Finally, several suggestions were made to improve its resilience. Fang et al. [55] considered investments in capacity expansion and backup to evaluate the performance of electrical transmission networks under nominal operations and after deliberate attacks. Dewenter et al. [56] studied the resilience of power-flow models to the failure of a transmission line, with resilience characterized in terms of the "backup capacity", defined as the additional capacity of the links that must be supplied to secure stable operation of the link with the greatest load in case of an attack or a failure in that link. According to Cuadra et al. [29], there are two different approaches to evaluating power grid resilience. The first is solely based on topological concepts, using metrics such as the mean path length, clustering coefficients, efficiency and betweenness centrality [57,58]. The second, a hybrid approach, introduces some electrical engineering concepts in an effort to enhance the topological approach, using metrics such as electrical betweenness and net-ability [56,[59][60][61][62]. For example, Guohua et al. [59] presented an assessment of the North China power grid based on complex network theory to investigate the tolerance of the power grid to attacks. Pepyne et al. [62] evaluated the resilience of a synthetic Watts-Strogatz network with 200 nodes and 400 links in terms of link attack schemes, disruption of the network and overhead lines.
Due to the increased focus on structural dimensions of resilience, a limited number of quantitative studies focusing on resilience and the variables that affect system performance, such as the cost of post-disruption operation, customer service and investments in design (Dixit et al. [63]). Therefore, the present work aims to fill this gap by assessing the resilience of power grids in meeting customer demand, not only by designing a system with increased resilience, but also by identifying how much resilience is improved when considering different possible methods to invest in the design of a system. Furthermore, even with the variety of applications of resilience, to the best of the authors' knowledge, the aforementioned articles do not consider the impact of disruptions to the electricity supply on industrial clients. Indeed, most of the resilience literature has overlooked differences among customers and their needs. Thus, our goal is to assess power grids' resilience with a focus on the industrial client perspective (Kwasinski [64]). Therefore, the proposed framework is intended to establish a "view of the grid" from the perspective of an industrial client; thus, our focus is not to address different types of failures in the main electrical power grid but to improve the resilience of the power supplies connected to industrial clients.
In this context, the main contribution of our work is to propose an optimization model using MILP to make decisions related to investments in the design of power grid resilience with a focus on the customer perspective. Our paper evaluates how costs associated with investments in the design phase can reduce both the impact and recovery efforts over time, given the occurrence of an undesired event. In other words, we can now determine how financial resources should be spent to design a resilient power grid. In this manner, we provide a glimpse into the decisions that consumers of electric power can make that influence the resilience of the overall system.
Additionally, in contrast to [29,41,[53][54][55][56][57][58][59][60][61][62]64], this article evaluates the performance of the electricity supply over time by examining the evolution of the impact of disruptive events on the system and its response. Despite the importance of considering this factor, the vast majority of work on power grids has not included the time dimension in its analyses of resilience [5,65,66].

Scope of the analysis
In this section, we first describe the main characteristics of an electrical grid to contextualize the scope of our analysis. The bulk power system is generally designed in accordance with the N-1 security criterion, requiring the system be able to bear the loss of one major component (mainly transmission lines and power transformers) without interrupting the electricity supply [33]. Moreover, typical distribution networks usually have interconnected feeders that can be automatically and/or manually switched on in case of failures.
In contrast, the electric power supply to industrial clients is usually provided by a single connection line and a step-down substation, and failures in this infrastructure can cause power supply interruptions and therefore additional production costs. Given this fact, the scope of our analysis is highlighted in Fig 2, representing our focus on the user perspective. Thus, disruptions of the system are analysed in terms of interruptions of the electricity supply to industrial clients that, for instance, serve critical societal functions. Fig 2 contains a set of Subtransmission Substations (SSs) denoted by SS j , which are responsible for ensuring energy supply to industrial clients, denoted by C i , where i > n 2 + 1 >. . .n 1 > 1, through subtransmission lines. Under normal conditions, each client has a demand Q i , which is served by a specific SS j (primary assignment) with capacity K j to accommodate the demand assigned to it. The electrical connection of industrial clients shown in Fig 2 could be generalized to other configurations. For example, it could involve a different number of SSs or industrial clients, which would only require modifying the allocation of clients per SS.
In this manner, the proposed model provides some alternatives to improve the resilience of the electric power supply for industrial plants, including normally open backup power lines, active parallel lines, purchasing of diesel generators, or increases in restorative capacity. However, the implementation of these reinforcements, in practice, depends on the costs of expanding the electrical connection and the expenditures arising from interruptions to the energy supply. Thus, based on some input data regarding a set of industrial plants, the solution proposed by the model indicates whether these alternatives should be implemented.

Modelling assumptions
This paper proposes an optimization model to minimize the total expected costs by means of implementing resilience-based alternatives that are useful in case of stoppages of the supply of electrical energy to industrial clients due to disruptions in the configuration analysed in Fig 2. The stochastic characteristic of the proposed model relies on considering different disruptive scenarios (each with its own probability of occurrence) in the electrical connections to industrial clients; the probabilities of occurrence are used in the calculation of the total expected cost.
However, there are myriad events that might cause disruptions in the electricity supply, for example, climate change [67], natural disasters [51,68], physical attacks [69] and terrorism [50]. However, this paper does not intend to consider every possible contingency or to model the causes of the disruptive events that affect power supplies to industrial clients. To evaluate different scenarios of disruption, we consider the following assumptions for the power grid in • SS j can be affected by an event that will partially or fully impact its capacity, thereby influencing the supply of the set of customers C i assigned to it. This capacity will be recovered over time in accordance with the recovery rate of the system; • The subtransmission line between SS j and a connected C i can be affected, thus halting only the supply of C i ; • Multiple failures can occur, affecting SS 1 and SS 2 , two subtransmission lines, SS j and a subtransmission line not connected to it, or a subtransmission line and its corresponding SS j .
Thus, the method will determine the optimal allocation of resources to minimize the overall expected costs for designing this power grid, assuming that an undesirable scenario could occur. In addition to the post-event response (i.e., efforts to restore the supply of energy to industrial plants), we consider pre-event decisions related to investments in improving resilience, which can be accomplished by including absorptive, adaptive and restorative capacities [1,5,11] in the phase of designing the electrical connections to industrial customers. The idea is to incorporate the concept of resilience into the design of the system, thereby considering different possibilities of IDR and the respective IS and PCR.
This problem gives rise to an MILP approach, for which the parameters and variables are described in Tables 1 and 2, respectively. The binary variables are set so that 1 indicates the existence or operation of some SS or link of the system and 0 otherwise.

Design phase: Pre-event costs
The options available for pre-event investments are translated into costs defined as IDR, and they are divided into three types of capacity: adaptation, absorption and restoration. Considering possible system interruptions and according to the adaptive concepts presented in [1,5], the possibilities for increasing the adaptive capacity are the following.
• To establish a backup line between SS k and C i so that the impact on the industrial plant operation will be reduced. Indeed, if SS j is affected, its demand can be supplied by SS k , with k 6 ¼ j.
Determining which SS k would work as a backup for C i will be based on the cost to establish the new connection. Backup lines are deemed to operate in hot standby mode.
• To build a redundant line that shares a load with the main line (active parallel) to ensure the supply of C i from its corresponding SS j . The model will determine the existence (or not) of this line so that, if the main line is affected, the redundant one will be able to support the full load.
• To invest in diesel generators to keep the plant at partial or full operation until the main power supply returns. Failures on demand of the diesel generators are not considered here.
Investments in absorptive capacity can be made by expanding the capacity of SS j so that the system will be able to better respond to an event that could affect subtransmission sub-stations or links. The opportunity to invest and expand the capacity of each SS k allows the system to more easily bear the loss of one or more SS j (k 6 ¼ j) because the system will have additional capacity to manage the additional demand of SS j , and consequently will continue to meet demands (partially or totally). The investments in restorative capacity will be spent on deploying additional maintenance crews and buying spares to increase the recovery rate. Considering the available options, the IDR can be expressed as shown in Eq (1): The first part of Eq (1) corresponds to investing in absorption, which is the possibility of adding capacity to each SS j . The next three terms correspond to possible investments in adaptive capacity: installing generators for C i , establishing backups for clients so their demands can be met by another SS (besides their primary supplier) and the possibility of setting a redundant line between C i and SS j , respectively. The last term corresponds to the investment in increasing the recovery rate.

Post-event costs
Post-event expected costs are associated with the financial impact of IS caused by a disruptive scenario on system performance and the efforts (PCR) to restore the system supply capacity. We consider that the losses of industries are a step-change function of the demand that is not met in period t for scenario c and for each type of client. However, there is a monetary penalty for each unmet MVA.
We also consider that industrial plants manufacture products, which have different added values; thus, the penalty depends on the specific industrial sector. Therefore, IS can be specified as the impact on the demand supply, and it is expressed in Eq (2): where p c is the probability of each scenario c, which corresponds to a disturbing event that causes an interruption to the energy supply. The first and second terms of Eq (2) represent the cost of supplying power from SS j and diesel generators, respectively. The third part reflects the penalty incurred because the main SS did not meet some portion of clients' demands. The fourth portion represents an additional fee for unmet demand beyond deadline d, which is usually established in the contract signed with the client. In this manner, if the supplier fails to meet such a time limit, there will be additional costs in addition to the existing penalties. Despite its importance, this penalty structure is not considered in the works mentioned above. The fifth term corresponds to the penalty for not meeting some portion of clients' demands when these clients have diesel generators. However, as before, there is a possible sixth term, which is an additional fee that is charged if the non-supply of power extends beyond d. We considered the fifth and sixth parts of Eq (2) because of the specific characteristics of the production processes. Usually, industries suffer great losses due to failures in the power supply even if interruptions are short. Equipment such as reactors, homogenizers, blast furnaces and other critical items do not simply return to their operational state when the power supply is reestablished, related to the inputs not being processed by the equipment (work-in-process) due to interruptions in the supply of power, which cannot usually be made to the full specifications set. This failure indicates that there has been a lack of control in the process. Moreover, even when the energy returns, there are production losses until the process returns to the default condition.
Therefore, the possibility of using a diesel generator can reduce the impacts caused by this problem and keep the equipment in operation to remove, for example, the material in process until power is restored, thus reducing the costs incurred by this interruption. In this case, the plant would be penalized only with the loss of production during this period and would no longer suffer losses due to the time spent on re-establishing process control. Therefore, the penalties that might be associated with the lead time when the power supply is cut and the generators are started will not be considered. Note that this approximation is reasonable given that the generators are equipped with automatic start, which usually takes 10 to 30 seconds to become operational.
PCR, in turn, includes the costs associated with the resources required to recover the system due to disturbances, i.e., the cost of restoring the performance of the system after an interruption c. The expected PCR is shown in Eq (3): where the first part indicates the costs associated with the use of recovery resources, if the recovery actions are directed to SS j , and the second term represents the cost associated with recovering a subtransmission line between SS j and C i .

Formulation of the model
The stochastic optimization model proposed is defined as an MILP problem with an objective function that combines the cost of investing in resilience-based actions in the network design phase (IDR) and the expected costs related to system performance and recovery (IS plus PCR). Thus, the objective function (Eq (4)) is the sum of IDR, IS and PCR, which are presented in Eqs (1), (2) and (3), respectively.
subject to: Constraint 5 is related to the limit of connections per client, assuming that each client can have only one backup connection at most. We assume this limit because (i) the cost of implementation of a backup power line is higher than that of a diesel generator; (ii) multiple backup lines would require increased space, which is not always feasible, mainly near urban areas; and (iii) finally, provided that the substation is operational, a single line would provide all of the energy needed to supply the industrial plant, while it would require multiple generators to have the same outcome.
Constraints 6-14 are associated with meeting the clients' demand. The demand of each client can be served by the corresponding SS j and its diesel generators (Constraint 6) so that the demand of C i can only be served by SS j , assuming this link exists and is operational (Constraint 7). Therefore, the portion of C i demand served by generators can only exist if generators have been installed in C i (Constraint 8), and this amount cannot exceed the capacity of the generators (Constraint 9). In addition, the whole demand that SS j is expected to meet cannot exceed its capacity (Constraint 10). Constraint 11 represents the portion of C i demand that is not supplied in each period, which can occur if either SS or the generators do not have sufficient capacity. If C i generator is activated, information represented by g itc , the unmet portion of C i demand is represented by h itc (Constraint 12); otherwise, it will be represented by y itc (Constraint 13). Consequently, the portion of each client's demand that is met and the portion that is not met in each period are complementary factors (Constraint 14).
Generators can only be activated if the subtransmission system for C i has been affected, given that the investment in their acquisition has been made (Constraint 15). In this context, the predefined subtransmission system operates in series such that, if any component that provides energy for C i is affected, the power does not reach C i . Therefore, Constraints 16 and 17 correspond to the connection between SS j and C i in accordance with the operational condition of each component of this system. The connection is operational if and only if at least K j of the capacity of SS j has been recovered (Constraint 17). Moreover, Constraint 16 represents the operation of the connection between SS j and C i , considering that the following: 1. If C i is primarily connected to SS j , this connection might or might not be operational (O ijtc ); 2. If C i is primarily connected to SS j , this connection could be ensured by a redundant line (H ij ); and 3. If C i is not primarily connected to SS j , SS j might be its backup (B ij ).
Constraints 18-20 register the state (whether operational or not) of the subtransmission line between SS j and C i , given that it is a primary connection (Constraint 19), and this line is subject to the occurrence of events that can affect its performance. A portion of each line (a ijtc ) can be recovered in each period and for a given scenario using the recovery rate ℓ and these lines must be fully recovered over time (Constraints 18 and 20) using the available resources (Constraint 25), which are shared among all subtransmission lines.
Constraints 21-23 represent the determination of the capacity of SS j , given that an event affects its operation, and its capacity must be recovered over time. Immediately after the occurrence of the disruptive event, SS j has reduced capacity or no capacity at all. Thus, restoration efforts can be undertaken by increasing capacity by r (the recovery rate parameter in MVA/ hour). This process continues, with recovery efforts being made hourly so that the entire capacity is recovered until T is reached. In the model, the SS recovery rate can be increased using additional resources (variable w), which should be devoted to hiring maintenance crews and buying spares.
Constraint 24 corresponds to the total resources available to recover SS and must be shared among all SSs. In addition, the costs associated with IDR and PCR cannot exceed the limit M, as shown in Constraint 26, which represents financial constraints. Constraints 27-29 specify the variation ranges of the variables as being non-negative integer, non-negative real and binary, respectively.
We demonstrate the applicability of the proposed model. Our aim is to evaluate how the strategies for improving resilience vary for a wide range of scenarios and for different investment options, assessing the corresponding impacts over time. In addition, the example is useful for discussing the validation and verification of the model.

Application example
Description of the problem. This section discusses the application of the proposed model to an example involving an EPSN with industrial clients from the chemical/petrochemical, food and manufacturing sectors. As mentioned above, this paper does not aim to consider every possible contingency over the whole power supply network. In fact, our aim is to improve the resilience of the power supply with regard to industrial clients' connections to the electrical power grid. This situation is of practical application for medium to large industries that have very high costs (and thus very low tolerance) when interruptions to the power supply occur in their production plants. Therefore, alternatives that improve the resilience of industrial clients' connections to the EPSN are provided. Fig 3 shows the power grid that will be addressed in this section. In Fig 3, clients are represented according to their sectors.
In this example, the power grid consists of 3 substations that together supply 150 MVA (Table 3) to industrial customers such that the capacity of each SS is given by the total demand assigned to it. Having both the added value of the products and the eventual loss of production as criteria, the chemical/petrochemical, manufacturing and food industries are ranked in this order, according to their level of importance to local economic activity. Thus, the energy supplier incurs different penalties for demand not supplied because of a disruption in the performance of the system.  To show how disruptions in the network can affect the investments necessary to achieve an optimal, resilient design, we defined a set of scenarios and their associated probability p c . These scenarios are used to specify the loss of SS supply capacity and the loss of subtransmission lines between SS and its clients.
As discussed above, interruptions can occur due to internal or external factors, including various natural factors. For example, in Brazil, atmospheric discharges and torrential rains, combined with falling trees, can interrupt the power supply to industrial clients. According to [30], the disaster probabilities are difficult to quantify. However, for this example, we are not concerned with identifying and analysing specific causes of events that could affect the network. In fact, our aim is to quantify several ways by which the system might become unavailable.
In this context, the proposed method for defining p c considers the observation of the network as a random experiment, for which three possible situations can arise: (i) no occurrence of a disruptive event; (ii) a single failure; or (iii) multiple failures. A single failure is understood as the loss of a node (SS) or a link (subtransmission lines). Multiple failures can be observed in (i) simultaneous failures: SS 1 and SS 2 , two subtransmission lines and SS and a subtransmission line not connected to it; or (ii) cascading failures since failures in both the line and its respective SS are a sort of cascading failure and cannot be considered independent events. Thus, the costs related to their recovery should also be considered. Observations of three or more simultaneous failures are not considered because they are very unlikely to occur.
We also consider simultaneous failures in both SS 1 and SS 2 because they are assumed to be connected to the same step-down Transmission Substation (TS). Thus, this failure could be related to a common cause, such as the loss of TS supply. However, we do not consider other joint failures of SS because they are very unlikely to occur, especially if they are connected to independent TSs. Table 4 shows that each element of the sample space (O) is related to a Resilience and design in electric power supply network scenario, which represents how an undesired event can impact the supply of electricity to industrial clients; all scenarios are assumed to be mutually exclusive. Furthermore, scenario {S 1 S 2 } (related to a TS failure) is considered less likely than joint failures {S j LP k }, {S j LM m } and {S j LF r } with k, m and r connected to j, which in turn are considered as probable as {S j }. Additionally, {S j } is less likely than scenarios {LP k }, {LM m } and {LF r }, which represent the disconnection of single lines. Such an assumption is based on the practice that a TS is designed with a more robust bus or better switching schemes, compared to an SS [70].
Given this assumption, we can establish relationships among the probabilities of occurrences of these scenarios. More specifically, if x is the probability of the scenario {S 1  Moreover, the probabilities of the scenarios with simultaneous failures (except {S 1 S 2 }, {S j LP k }, {S j LM m } and {S j LF r } with k, m and r connected to j and with the corresponding probabilities defined above) are given by multiplying the probabilities of their respective single scenarios. For example, if c 2 x is the probability of {LP k }, then the probability of {LP k LP q6 ¼k } is c 2 2 x 2 . In this manner, Table 4 shows the scenarios and their respective probabilities. Note that scenarios {S 1 }, {S 2 } and {S 3 } are equally likely. Thus, scenario type {S j }, j = 1, 2, 3, represents three different scenarios with similar definitions and likelihoods (each corresponding to the failure of one SS). The number of similar scenarios is also indicated in Table 4, which shows a total of 209 possible scenarios. Then, the event "no occurrence of a disruptive event (no failure)" is considered complementary to the other failure scenarios.
The consequences of each of these scenarios are different; for example, scenarios {LP 1 LP 2 } and {LP 2 LP 3 } are equally likely, but their effects can differ because P 1 and P 2 are connected to SS 1 , whereas P 3 is connected to SS 2 . Thus, all scenarios should be incorporated into the optimization problem.
In this context, we analysed four cases, for each of which all of the scenarios shown in Table 4 were considered. The different cases were defined based on the probability of the scenario {no failure}. Thus, x is estimated by the definition of the probability of {no failure} and using the property that the sum of probabilities of all scenarios equals 1. For the positive constants c 1 and c 2 with 0 < c 1 < c 2 , the computation of x is always possible. Therefore, having obtained x, the probability of the other scenarios can be estimated using the relations given in Table 4.
We cannot predict exactly which adverse events will occur or when and with what intensity. Nevertheless, given that our approach anticipates the resilience pre-and post-event actions that should be considered, using the probabilities of disruptive events is a method to represent their intrinsically uncertain nature, and doing so also permits the calculation of the expected cost, which is a measure that can guide how resources should be allocated to enhance resilience. In the next section, we present examples of applying the proposed model, which was solved using IBM ILOG CPLEX software, which applies the exact Branch-and-Cut technique (Hillier and Lieberman) [71].

Results and discussion
The probability of scenario {no failure} and the corresponding x for each of the 4 cases are shown in Table 5. Note that we consider P{No failure} = 0.9, 0.7, 0.3, 0.0 for cases 1, 2, 3 and 4, respectively. In other words, we assume that the probability of a disruptive event is low in case 1. Next, we increase this probability in cases 2 and 3. Finally, we analyse in case 4 a situation in which a disruption will occur for certain. These cases were defined to evaluate the behaviour of the system over T = 8 hours and the response of the model to different possibilities. However, we considered c 1 = 10 and c 2 = 100, i.e., the failure of a subtransmission line is ten times more likely than the failure of an SS or of a line and its respective SS. The results presented in this section were obtained disregarding financial resource constraints. In fact, we disregard Constraint 26 to achieve an optimal resilience strategy with unconstrained financial resources. We also perform sensitivity analysis to assess the impact of limited budgets on the optimal resilience strategy and hence on system performance (see next section). The parameter values for the proposed model shown in Table 6 are fictitious for the sake of confidentiality. However, they were carefully estimated to represent reality.
The comparison between the results in terms of IS, IDR and PCR obtained for each of the four cases is shown in Fig 4, where the total expected costs are presented. Fig 4 illustrates that in case 1, which has a low probability of occurrence of any disruptive event, no investments in resilience are necessary. In fact, one can state that, when the probability of scenario {no failure} is high, the model does not suggest investments in resilience.
Moreover, in analysing Fig 4, we observe that, as the probability of scenario {no failure} decreases, the total expected cost considerably increases. In fact, comparing cases 1 and 2, the expected total cost was approximately 5 times greater in case 2 than in case 1. Additionally, compared to case 1, the total expected cost of case 4 increased drastically from $ 716,370 to $ 5,049,530. This significant increase is justified by the increases in IS, PCR and IDR values as the probability of {no failure} decreases. For case 1, the highest penalties (related to unmet demand) are observed in scenarios {LP1}, {LP2}, {LP3} and {LP4}, comprising 17% of the total expected penalty. For case 2, there was an investment of $ 1,400,000 in IDR.   Table 6. The investment in this resilience-based alternative assures that P 4 has its demand fully met when its main subtransmission line is affected. Consequently, the penalty for the {LP 4 } scenario decreased from $86,000 in case 2 to zero in case 3. In addition, this design feature, while maintaining the operation of the system, is also used to share the workload with the main subtransmission line. It is important to note that, in practice, the design and installation of redundant lines connected to the same SS consider a distance criterion to avoid one tower falling onto an adjacent line.
In case 4, the solution of the model suggested active parallel subtransmission lines for clients P 1 and P 2 (Fig 6). Comparing cases 3 and 4, after investing in redundant subtransmission lines, the penalty related to unmet demand for the {LP 1 } and {LP 2 } scenarios decreased from k $ 362 in case 3 to zero in case 4. As in case 2, there was also a recommendation to invest in restorative capacity for cases 3 and 4, causing an increase in the SS recovery rate of 5 MVA/ hour; i.e., it increased from 20 MVA/hour to 25 MVA/hour. Investment in active parallel subtransmission lines and in restorative capacity seems reasonable since the probability of each scenario remains low, although the probability that an event could impact a subtransmission line is considered to be ten times greater than the probability of an event that could affect an SS. However, although the cost of adding a single 2 MVA diesel generators is approximately 25% less, this action would not be as efficient as the parallel active subtransmission line in cases 3 and 4 because it would not enable the system to supply the client's entire demand. For example, a petrochemical client would have to invest in eight generators to ensure that its demand supply was met during disruption, and the cost of this action would be approximately six times greater than that of investing in an active parallel subtransmission line.

Assessment of the constraint on financial resources
In this section, we evaluate the impact of budget constraints on defining the optimal resilience-based strategy and hence on system performance. In the proposed model, the financial constraint is represented by the parameter M, which limits the investments in resilience enhancement actions (pre-event actions) and the costs associated with post-event recovery (see Constraint 26). Thus  As shown in Fig 7, as M decreases, the cost associated with the impact on the system (IS) increases. For example, from the "without restriction" case to M = $ 1 million and M = $ 0.5 million, IS increases by approximately 15% and 44%, respectively. Consequently, the total expected cost also increases. Therefore, the reduction in M directly impacts the decisions on drawing up a resilience-based strategy and hence on the system performance to meet demands.
Note also that PCR does not change in the situations presented in Fig 7 because (i) all of them represent the same case 4, with all 209 scenarios and their respective likelihoods, and (ii) the system must fully recover over the time period of 8 hours (see Constraints 20 and 23). Thus, it is important to note that increasing IDR does not indicate that the PCR will be reduced because a certain total amount of resources will always be needed to perform the recovery actions associated with the disruptive event, regardless of IDR.

Further assessments: Evaluating specific scenarios
It is also important to emphasize the flexibility that the model offers to propose solutions for a given particular event. Thus, we analyse two different scenarios to identify the optimal resilience-based strategy considering the occurrence of (i) failure of SS 1 (scenario {S 1 }) and (ii) simultaneous failure of SS 1 SS 2 (scenario {S 1 S 2 }). We believe that these disruptions are related to severe consequences; thus, we analyse the resilience actions that are appropriate for each of them. To this end, for each scenario, we consider its probability of occurrence equalling 1; thus, the other events in Table 4 will not occur.
Assessment of failure of substation SS 1 . We evaluate this scenario for 4 investment possibilities. First, we disregard financial resource constraints (the "without restriction" case). Next, we consider M = $ 4 and M = $ 2 million. Finally, we consider the worst-case situation with no investments in resilience enhancement actions (the "without IDR" case); the results are shown in  As in the previous case, Fig 8 also shows that, when M is reduced, the costs associated with the system impact IS, and expected total cost increases, affecting the decisions in the elaboration of the strategy based on resilience. Thus, the expected total cost for the "without IDR" case is almost six times greater than that for the "without restriction" case. Fig 9 shows the investments that should be made to enhance power grid resilience for each budget. These investments are assessed according to the performance of the SS recovery and the extent to which the supply of electricity meets the client's demand, which is directly affected by the resilience actions undertaken during the downtime of the corresponding SS. We evaluate the impacts on clients P 1 and F 1 , considering the portion of their demands supplied in scenario {S 1 }; these clients were selected to evaluate performance in supplying power to the industrial sector. The recovery speed of SS 1 and the costs associated with PCR and IS are also illustrated in Fig 9. According to Fig 9, higher budgets (M) emphasize investment to minimize the portion of unmet demand, while lower budgets show increased IS. In contrast to the previous cases, note that, when we consider the unavailability of SS 1 , the investments for the "without restriction" case yield improvement in the absorptive and adaptive capacities. Indeed, we can see in Fig 9 that the model suggests that (i) 6 backups connections should be established (P 1 , P 2 , F 1 , F 2 , F 3 and F 4 ) so that the clients can be supplied by SS 3 and (ii) additional capacity should be added to SS 3 so that it will be able to supply the additional demand.
Because SS 1 clients would be fully supplied by SS 3 (Fig 9b), recovery of SS 1 would only be completed in T = 8 h (Fig 9a), as Constraint 23 requires. However, note that in    Figures (a, c, e, g) show the capacity recovery of SS 1 and postinterruption cost recovery (PCR) for M = "Without restriction", $ 4 and $ 2 million and "without IDR" . Figures (b, d, f, h) present the supply portion that meets the demand of customers P1 and F1 and IS for M = "Without restriction", $ 4 and $ 2 million and "without IDR". In addition, for each M, there is a list of resilience strategies employed on the left side of each figure.

Assessment of the simultaneous failure of substations SS 1 and SS 2 .
Although the probability of scenario {S 1 S 2 } is usually very low, if it occurs, it would have great impact on the performance of the system. Fig 10 shows the total cost of this event for different budget constraints. First, as in the previous section, we do not consider financial resource constraints (the "without restriction" case), and then M = $ 10, 7 and 3 million. Finally, we also consider the worst-case situation with no investments in resilience enhancement actions (the "without IDR" case).
Therefore, the optimal strategy for scenario {S 1 S 2 } has a total cost of $13,500,600: approximately 88% less than the case in which no investments in resilience are made. In fact, IS represents 16% of total expected costs for the "without restriction" case and 98% for the "without IDR" case. This finding emphasizes that investments in pre-event actions to enhance resilience (including investments in adaptive, absorptive and restorative capacities) have the potential to enable better allocation of the available financial resources to improve the efficiency of the response if disruptive events occur.
Note that, as explained for case 4, PCR remains constant for all situations presented in Fig 10 since all of them represent the occurrence of scenario {S 1 S 2 }, and the system must fully recover over the time period of 8 hours (see Constraints 20 and 23). However, PCR is much greater for scenario {S 1 S 2 } than for case 4 because we would then have more severe consequences.
For the "without restriction" case, according to Fig 11, the resilience actions are (i) acquiring 17 diesel generators; (ii) establishing 4 backup connections from SS 3 to P 1 , P 2 , P 3 and P 4 ( Fig 12); (iii) investing in additional capacity to SS 3 (50 MVA) to accommodate the backup connections; and (iv) investing in increasing the recovery rate (w = 5 MVA/hour). Note that (i) and (ii) are related to adaptive actions, whereas (iii) and (iv) concern absorption and restoration actions, respectively.  Figures (a, c, e, g, i) show the capacity recovery of SS 1 and PCR for M = "without restriction", $ 10, 7 and 3 million and "without IDR" . Figures (b, d, f, h, j) present the supply portion that meets the demand of customers P1, M1 and F1 and the cost of IS for M = "without restriction", $ 10, 7 and 3 million and "without IDR". In addition, for each M, there is a list of resilience strategies employed on the left side of each figure.
Although the investment in the recovery rate seems small, note that each SS can only be stated as operational when at least K j of its capacity (50 MVA in this case) is fully recovered. Thus, this investment allows for the recovery of SS 1 to be completed in d = 3 hours (see Fig  11a). Although SS 1 and SS 2 have the same demand in MVA, note that SS 2 has more clients, which are ranked higher in importance than SS 1 (see Fig 3). Thus, the penalties would be higher if the clients of SS 2 are not rapidly supplied. In this manner, the model prioritizes pre- event (adaptive and absorptive) actions to enhance resilience for SS 2 clients, and it determines recovery strategies for SS 1 .
However, the sum of the clients' demands would be allocated as backup to SS 3 (P 1 , P 2 , P 3 , P 4 ), exceeding its additional capacity by 10 MVA and thus indirectly affecting the supply of its own clients. In fact, clients P 1 , P 2 , P 3 , and P 4 are prioritized because they have greater importance than the clients of SS 3 . To reduce this consequence, generators could be added to some clients of SS 3 , such as F 7 and F 8 . In this case, after an interruption, because P 1 is connected to SS 3 by means of a backup connection, its demand is not affected (Fig 11b). Table 7 shows the allocation of generators to each client; for the "without restriction" case, we also show the portion of their demand supplied by generators during SS 1 and SS 2 downtime. For instance, even during SS 2 downtime, M 1 will have 100% of its demand supplied because 5 diesel generators have been added (Fig 11b). In contrast, only 1 generator was allocated to F 1 . Because the supply capacity of the diesel generator is 2 MVA/hour, the supply of 40% of its demand is ensured until SS 1 is fully recovered by period d = 3 (Fig 11b). Therefore, this allocation actually reduces the overall expected penalties incurred due to unmet demand. Thus, by adopting this strategy, only 5% of the total demand originally allocated to SS 3 would not be supplied during concomitant SS 1 and SS 2 downtime.
For M = $10 million, the number of diesel generators was reduced by 35% (Table 7), and the 4 backup connections were now from SS 3 to P 2 , P 3 , P 4 and M 1 . In the "without restriction" case, the backup allocation to SS 3 affected the supply of its own clients (F 7 and F 8 ), which no longer occurs. However, in this case, supplying the demand of P 1 is greatly affected, as shown in Fig 11b, since only one generator is allocated to P 1 (Table 7). For client M 1 , because it has SS 3 by means of a backup connection, its demand is not affected. Conversely, F 1 remains with one generator, thus ensuring the supply of 40% of its demand until SS 1 is fully recovered. In this case, three clients of SS 2 are also connected through backup to SS 3 (P 3 , P 4 and M 1 ). Thus,  P1  --1  1  1  --P2  ---1  ---F1  1  40  1  1  ---F2  1  40  1  1  ---F3  1  40  1  1  ---F4  1  40  1  1  ---P3  ---1  1 1 - to minimize the impact, SS 1 should be recovered before SS 2 (Fig 11c), and the demand of their clients (P 1 being one of them) is supplied normally from period 3 (Fig 11d). For M = $ 7 million, the total number of diesel generators decreases to 9, and the resilience strategy adopted for this case is more reactive because the highest amount of investment is directed to accelerating the recovery rate, which increases from 20 MVA/hour to 50 MVA/ hour (w = 30 MVA/hour). Thus, the resources for SS recovery are shared between SS 1 and SS 2 so that both return to normal operation by the deadline d = 3 (Fig 11e). Another important point is that the fastest recovery speed was achieved for M = $ 7 million, even when compared to the case "without restriction" and M = $ 10 million.
For M = $ 3 million, investment is still made in (i) accelerating the recovery rate (w = 5 MVA/hour) and (ii) one generator each for clients P 3 , P 4 and M 1 . The recovery speed is similar to what was presented for the "without restriction" case and M = $ 10 million, the recovery of SS 1 being completed in three hours and that of SS 2 in five hours (Fig 11g). However, the results for the supply meeting the demand in this case are worse than those presented for M = $ 10 million (Fig 11b). Fig 11i and 11j also illustrate the worst situations ("without IDR" case), in which no resilience enhancement actions are implemented during the design phase.
Briefly, we can note that when the budget reduces, the cheapest strategy is to invest in (i) acquiring diesel generators and (ii) accelerating recovery. As mentioned before, using generators can reduce the impact of an event on the system because doing so can keep critical, industrial equipment in minimal operating condition until the power supply returns to normal. For petrochemical clients, for example, the generators can be used to remove the work in process and to allow the system to restart without any further delays when the power supply returns.
However, Fig 13 illustrates the portion of the overall demand supplied in each situation, considering the performance for all clients over the 8-hour period. Fig 13 indicates that actions towards incorporating the absorption and adaptation capacities enable the response to be more effective than actions that focus on recovery. Moreover, our model reflects that it is economically unfeasible to ensure that 100% of the demand will be met should disruptive events occur. However, we can minimize the impact on the system (IS) by adopting pre-event resilient actions.

Conclusions
This paper proposed a model to optimize costs in the design phase of an EPSN related to industrial clients when resilience-based actions are considered. The MILP model developed was able to incorporate (i) several disruptions with their respective probabilities of occurrence and (ii) worst-case scenarios, in which a specific event with severe consequences is considered. In the first situation, the probabilities of occurrence of each of the mutually exclusive scenarios are considered, and the output of the model is the optimal strategy involving pre-and postevent actions that minimize the expected total cost. We assessed four different cases by varying the probability of the event {no failure}. In the second situation, the probability of the selected scenario was set to 1, while the probabilities of all other scenarios were 0. We evaluated the scenarios {S 1 } and {S 1 S 2 } involving the loss of SS 1 and the joint failure of SS 1 and SS 2 .
The model was validated by two types of sensitivity analysis. First, we increased the probability of the occurrence of an undesired event. From the results, we can see that our model indicated that the decision maker should also increase investments to design a more resilient system. In contrast, by reducing the probability of occurrence, no investment should be made. Thereafter, we also evaluated how the model behaves for different budgets. As expected, as we decreased the budget, the IS increased rapidly, indicating the usefulness of investing in resilience during the design phase. Note that the proposed model also indicated how the resources should be spent for each case.
The results obtained enabled the optimal solution to be analysed in terms of IS, IDR and PCR. Moreover, detailed IDR actions (e.g., redundant or backup lines, diesel generators) are real-world suggestions to improve the resilience of EPSN related to industrial clients. Thus, the impacts on EPSN clients due to disruptions were reduced, as evidenced in the sensitivity analysis, in which IS increased by reducing the investments in resilience strategies. This analysis also showed that the lower the investment in IDR, the greater the level of unmet demand, which can yield financial losses for the entire system.
Another important contribution is to draw attention to a paradigm change in how a power grid is viewed: the traditional stance is that the grid is system centred on electric power utilities. However, the new paradigm is that the grid is not only system centred but is also a customer-focused system, which is the reasoning followed by other authors, such as Kwasinski [64]. Therefore, our model includes strategies that can be applied both to electric power grids and by industrial customers. For example, such strategies include considering redundant or backup systems and diesel generators, thus allowing customers to make decisions about managing electric power, which has a strong influence on enhancing the overall resilience of the entire grid.
We point out some limitations of this work. First, we focused on adopting the "resilience triangle" concept. However, other capacities or strategies for resilience do exist and they can be the focus of future research. For example, Lundberg [72] suggests considering the "learning" capacity to monitor and anticipate a disaster. Another possibility is to deem structural changes to increase the absorptive capacity of the system against shock (e.g., Raby et al. [73]). Moreover, we have considered the objective function as a weighted average of the costs of a set of possible interruption events, each with its respective probability. This could be thought of a limitation because, for example, low-probability high-consequence and high-probability lowconsequence events are considered similar for resource allocation purposes. Despite that, the model allowed us to investigate specifically high-consequence events such as the failure of SS 1 and the simultaneous loss of SS 1 and SS 2 .
Finally, developing a multi-objective optimization model is an issue of our ongoing research. In fact, we aim at minimizing the total costs related to the three resilience capacities (absorption, adaptation and recovery), as well as maximizing the level of service to industrial customers. Other topics of ongoing research involve (i) analysing how local energy storage can contribute to rendering the electric service at an industrial plant more resilient to disruptions and (ii) for more fine-grained networks, although the proposed MILP is still valid, investigating a method that uses a metaheuristic solution (e.g., genetic algorithms) is an alternative due to the greater number of system nodes and links.