Figures
Abstract
Under the grid management framework for railway lines, each rail grid consists of several adjacent steel rails. The degradation processes of the steel rails within a grid are characterized by common-mode deterioration due to their similar spatial locations and accumulated gross tonnages. Additionally, in line with the railway repair and maintenance regulations, comprehensive replacements of rail grids are prohibited during the summer months, and similar restrictions apply in winter. To effectively minimize the operating and maintenance costs associated with a rail grid over a one-year period, this study develops a k-out-of-n: F system model that incorporates common-mode degradation. A dynamic and alternate replacement strategy is proposed, focusing on both system-level and component-level interventions. The degradation of each steel rail is modeled using a Gamma process, while the dependence structure of the system is characterized through a copula function. The maintenance model is structured as a Markov Decision Process (MDP). To address the MDP problem, approximate and analytical expressions for discretized state transition probabilities within the multiple-component system are derived. These expressions are determined using the copula function, and an algorithm is designed to construct the corresponding transition probability matrix. The monotonicity of value functions is also explored. A numerical example is provided to demonstrate the feasibility and effectiveness of the proposed model.
Citation: Wang L, Ding M, Liu B, Qiu Q (2025) Optimizing alternate replacement strategy of k-out-of-n: F systems with common-mode degradation: An application to railway grid. PLoS One 20(5): e0322001. https://doi.org/10.1371/journal.pone.0322001
Editor: Zhengmao Li, Aalto University, FINLAND
Received: January 18, 2025; Accepted: March 16, 2025; Published: May 22, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: No data used.
Funding: This work is supported partly by the National Natural Science Foundation of China (72271169, 72371030 and 72001026).
Competing interests: No competing interest.
1. Introduction
Numerous industrial systems are comprised of multiple components that must work together to efficiently and effectively achieve their objectives. The interdependence among these components is unavoidable [1,2]. Ignoring component interdependence can lead to overestimating the reliability of multi-component repairable systems and result in impractical maintenance decisions [3]. Consequently, integrating the interdependence has become crucial in reliability analysis and maintenance strategy optimization. The advances in sensing and monitoring technologies have increased interest in condition-based maintenance (CBM) [4,5]. However, optimizing CBM strategies for systems with interdependent components presents significant challenges in both academic and industrial contexts. This is due to the complex dependence structures, multiple failure effects, and the need to balance maintenance costs with the system reliability [6].
This study is motivated by the operation and maintenance of railway rails in China, where steel rails represent a vital component of the railway track infrastructure. As time progresses, the performance of these rails gradually deteriorates, making regular maintenance and replacement essential to ensure the safety, smoothness, and comfort of train journeys. In China, a comprehensive grid management strategy for railway lines is currently being rolled out. This approach categorizes railway lines into adjacent segments, each spanning 1 km in length, referred to as a “grid”. As shown in Fig 1, a grid includes several steel rails. The condition of individual steel rails can be classified into four states: normal, minorly defective, severely defective, or broken [7]. The state of a rail grid is defined by the number of severely defective or broken rails it contains; if this count exceeds a predetermined threshold, the entire grid is replaced. Both the threshold and the number of defective rails are influenced by geographical factors specific to the region. To model this phenomenon, rail grids can be conceptualized as a k-out-of-n: F system.
According to [8], several heterogeneous factors contribute to the degradation of steel rails, including spatial characteristics such as narrow-radius curves and steep gradients, as well as cumulative gross tonnage. Given that rails within a grid experience the same accumulated tonnage, their degradation is often subject to common-mode failure. Additionally, rails located in sections with tight curves or steep slopes exhibit similar common-mode dependence due to their shared environmental conditions. Moreover, China’s railway repair and maintenance regulations [9,10] prohibit comprehensive replacements of rail grids to mitigate the risks of rail expansion during hot summer months, a guideline that remains applicable in winter as well. Conversely, replacement activities at the component level are relatively more flexible. In this context, this paper proposes a novel k-out-of-n: F repairable system model that accounts for common-mode deterioration. Additionally, it discusses various alternate replacement strategies at both the system and component levels, providing insights that could enhance the efficacy of maintenance practices within the rail industry.
2. Related literature
2.1. Maintenance optimization for systems with common-mode deterioration
The literature discusses various forms of dependence, including stochastic, economic, and structural [11–14]. Among these, stochastic dependence has garnered the most attention due to its complexity. Stochastic dependence in maintenance modeling can be categorized into three types based on different characteristics: failure interactions, load sharing, and common mode deterioration [6,15,16]. Methods for describing stochastic dependence typically involve multivariate joint distribution models, copula-based models, and degradation rate interaction models [17].
Our work focuses on multi-component systems with common-mode deterioration. Common-mode deterioration occurs when components experience simultaneous degradation due to similar operating conditions. A key feature of this phenomenon is that the acceleration or deceleration of degradation for each component typically transpires concurrently. Various studies have addressed the reliability analysis and optimization of operation and maintenance for multiple-component systems experiencing common-mode deterioration [18]. Approaches such as multivariate subordinator processes [19,20], Wiener processes with positive correlation coefficients have been employed to model the dependence structures of these systems [21].
2.2. Reliability modeling based on copula functions
The copula-based method, introduced by Sklar in 1959, has found extensive application in reliability modeling [22–24]. This approach effectively separates the dependence structure of a set of random variables from their marginal distribution functions, facilitating easier parameter estimation. Additionally, it permits the definition of various multivariate distributions with consistent dependence structures by allowing different marginal distributions, thus accommodating complex dependency structures such as nonlinear dependencies and tail dependencies [25]. The function that links multiple distributions is known as an ordinary copula, which has been utilized to quantify the dependence structure of component lifetimes [26–30] and component deterioration increments over specified time intervals [31,32].
To address the limitations posed by ordinary copulas, particularly their reliance on time and the potentially unbounded number required for modeling stochastic processes, Tankov extended copula theory to include Lévy copulas. Lévy copulas quantify the dependence structure among processes using finite and time-independent parameters and allow for distinct marginal processes. For instance, [33] employed Lévy copulas to characterize the common-mode degradation dependence between two components, proposing a novel condition-based maintenance strategy. Building on this, [6] utilized nested Lévy copulas to quantify the hierarchical common-mode degradation dependence structure of multiple components and explored a new inspection and replacement strategy. In these strategies, the degradation information for unmonitored components at inspection epochs is updated based on the most recently monitored component and the conditional copula functions. By leveraging stochastic dependence, these strategies have been shown to be more efficient.
Regarding steel rails, univariate Gamma processes and bivariate Gamma subordinator processes have been used to model wear degradation [34,35]. Additionally, discussions on the optimization of operations and maintenance for steel rails have also been addressed, highlighting the significance of effective strategies in prolonging the life and reliability of these critical infrastructure components.
The main contributions of this paper are as follows: (1) A k-out-of n: F repairable system with common-mode deterioration dependence is constructed to model the damage processes of rail grids. According to the reports from the professional fields, the common-mode dependence among steel rails in a grid is characterized by a Lévy copula. (2) Based on the railway lines’ repair and maintenance management rules over one year, an alternate replacement strategy of system level and component level is proposed. The alternate replacement strategy can be used to develop annual maintenance and repair plan of rail grids. Markov decision process (MDP) model under finite stages is established to get the optimal strategy. (3) To solve the MDP problem, analytical and approximate expressions of discretized states transition probabilities for the multiple component system are obtained. The expressions can be determined by the Lévy copula function and an algorithm is designed to get the matrix.
The remainder of the paper is organized as follows. Modelling assumptions and the alternate replacement strategy are given in Section 2. In Section 3, the system states are discretized. The analytical expressions of the system state transition probabilities are discussed. To get the optimal alternate replacement strategy of system level and component level, Markov decision process model of four stages is established in Section 4. State transition probability matrix algorithm and backward dynamic programming algorithm are designed. The monotonicity of value functions is also discussed in this section. In Section 5, a numerical example is given to illustrate the efficiency of the model. Sensitivity analysis of dependence strength and cost-related parameters is also performed.
3. Modelling assumptions and the alternate replacement strategy
3.1. Modelling k-out-of-n systems with common-mode deterioration
We consider a -out-of-
: F system which fails if and only if at least
of the
components fail. Let
denote the degradation state of component
at time
. The degradation increment of component
from time
to
,
is Gamma distributed and the initial state
. Component
fails if its degradation state exceeds the failure threshold
.
The random vector is used to denote the system degradation state at time
. Assume that the system state increments
are independent and time homogeneous. The degradation processes of all components are common-mode dependence and the dependence structure can be quantified by a Lévy copula function. According to the limit theorem of Lévy copulas, given time
, we can find the appropriate ordinary copula
, such that the joint distribution of
[6]
where is the marginal distribution of
and
is dependent on
.
Fig 2 shows a sample of deterioration path for a four-component system which is modeled by Gamma processes with copula dependence. We can observe that the four components degrade in the similar trend. Furthermore, the degradation processes of components 1 and 2 are much closer and those of components 3 and 4 are no significant difference.
The main reasons that copula functions are chosen to quantify the common-mode dependence among steel rails in a grid are as follows: (1) The copula method has significant convenience in parameter estimation and great flexibility in describing dependency relationships. (2) Massive literature has utilized copula functions to model the dependence structures among components’ degradation processes, which is stated in Section 2.2. (3) In [6], copula functions are used to characterize the dependence structures of four wheels’ degradation processes in a vehicle. Rolling friction and sliding friction are the main causes for the degradations of wheels and steel rails. Thus the degradation mechanism of the rail gird is similar to that of the wheel system.
3.2. The alternate replacement strategy with system level and component level
The track inspection cars are capable of assessing the condition of steel rails, including wear levels, deformation, and any abnormal noises. At the start of each season, managers should plan the annual maintenance activities in accordance with maintenance regulations based on the condition of each rail grid as reported by the inspection cars. To effectively schedule the maintenance activities of a rail grid over a one-year period, we set the time inspection interval to be one season and consequently the terminating time of one-year can be denoted by
within the lifecycle of a rail grid. Then we employ MDP to model the alternate replacement decisions with the objective of minimizing the cost over
(the cost over one-year).
The failures of the system and components are non-self-announcing and the system is monitored periodically with time interval . We consider a four-stage decision problem with the terminating time given by
. The inspection is merely an information-taking action that reveals perfectly the degradation level of all components. After the inspection, the alternate replacement strategy of system level and component level is performed as follows.
- (1) At time
, if the system fails, it will be replaced in entirety; otherwise, the system can be replaced in entirety or doing nothing.
- (2) At time
, if the system fails, it will be replaced in entirety; otherwise, every component can be replaced or doing nothing.
- (3) At time
, the replacement strategy of system level is the same to that at time
. At time
, the replacement strategy of component level is the same to that at time
.
- (4) The time for inspections and replacements is negligible. The cost for the replacement of component
is
. The system replacement in entirety or replacements of components will incur a constant set-up cost
. Given the system degradation state after the replacements or doing nothing at an inspection epoch
, let
be the penalty cost of system failure over an inspection interval, where
is the fundamental penalty cost of system failure,
is the number of failed components. We assume that the function
increases with
.
4. Discretizing the system states and the system state transition probabilities
4.1. Discretizing the system states
A discrete space state is often required in order to solve an MDP and it is practical in engineering practice. In this section, the states of the multiple component system will be discretized and the action spaces of the alternate replacement strategies are also given [31,36,37].
Let be the discretization interval of the continuous degradation components. The interval
can be discretized into
equally sized intervals. Component
is said in state
if and only if
and the failed state
corresponds to the interval
. After the discretization, the state space of each component
. The illustration of state discretization for one component is shown in Fig 3. In Fig 3, the continuous states of one component are discretized into 5 states and 5 is the failed state. The failure threshold is equal to
.
Let
then
where is the incomplete Gamma function. Let
be the discretized system state space, then
. For every
, from the modeling assumptions and the properties of joint distribution functions [24]
Let denote a maintenance action for the system, where
and
stand for the replacement of component
and doing nothing on it, respectively. According to the modeling assumptions, under the replacement strategy of system level, the action space
, where
,
and the action space for the replacement strategy of component level,
.
For , let
be the system state after a maintenance action
, then
Specially, ,
.
4.2. State transition probabilities
Let
be the state of the system at the
th inspection. When no maintenance actions are taken, the state transition probability matrix over an inspection interval is denoted as
, where
Due to the independent and time homogeneous increment degradation process, is also time homogeneous.
In the MDP, the transition probability is usually approximated by [31,32]
i.e., we assume each component is at the left endpoint of continuous degradation interval at the inspection epoch. In the literatures, Monte Carlo simulation is usually used to estimate the probabilities in Equation 2. In this paper, analytical expressions of under different cases will be discussed.
Case 1:
and there exists at least one pair
such that
.
If there exists at least one pair such that
, then at least one component’s degradation state in the system reduces over
time. Form the increasing property of Gamma process, the event is impossible when no maintenance action is taken, hence
under this case. Therefore,
is an upper triangular matrix.
Case 2: and for every
,
.
Under this case, all components are in working states after and before the transition. Furthermore, all components degrade to worse states after the transition. From Equation 2,
Since the increment processes of the system states are the independent and time homogeneous, Equation 3 can be rewritten as
From Equation 1, the above equation can be expressed as
This paper only involves the analysis of transition probabilities over time interval, for simplicity, we denote
as
and
as
, respectively.
In summary, under case 2
If the failed state is 6, then and
meet with the conditions of case 2 and there are 16 terms on the right side of Equation 4. Furthermore
is four-dimensional vector whose elements are 0 or 1. For
, the term corresponds to it is
.
Case 3: ,
,
, where
Under case 3, within time interval, the state of component
remains unchanged and in working state and the other
components degrade to worse and working states.
By the law for the intersection of events, under case 3,
From Equation 1,
where . By setting the
th
variable in copula function
as the value shown in Equation 5, we can obtain
.
It is known from the nonnegative property of that
, hence, for
,
. Consequently, for
, from the property of copula functions, if there exists at least one
,
in Equation 5 will be zero. Furthermore, if
,
and
. To sum up, under case 3,
If the failed state is 6, then ,
meet with the conditions of case 3. The states of components 1 and 2 are the same within
time interval. According to Equation 6, there are 4 terms on the right side of it and
is two-dimensional vector whose elements are 0 or 1. Furthermore,
Case 4: ,
, where
,
.
Under case 4, at the inspection epoch , all components are in working states and at the inspection epoch
, component
fails and the other
components remain in working states. According the modeling assumptions and Equation 2.
From Equations 1, 7 can be rewritten as
Noting that we have
If the failed state is 6, then ,
meet with the conditions of case 4. Components 1 and 2 transit to the failure state within
time interval. According to Equation 8, there are 16 terms on its right side. For
, the term corresponds to
is
.
Case 5: ,
, where
and
.
Under case 5, at the inspection epoch , component
fails, the other components are in working states, at the inspection epoch
, component
fails and the remaining
components are still in working state. Using the similar methods to those of cases 1–4, the transition probability under case 5 can be given as follows,
If , then
,
meet with the conditions of case 5. The states of components 1 and 2 are the same within
time interval. According to Equation 8, there are 8 terms on the right side of it. For
, the term corresponds to
is
.
5. Four-stage MDP model
In this section we formulate the alternate replacement problem within the framework of a four-stage MDP. The algorithm to get the discretized state transition probability matrix and backward dynamic programming algorithm are designed to get the optimal alternate replacement strategy.
5.1. The value functions
Given the system state
at inspection epoch
, let
be the minimal cost over
. Under the MDP structure,
is the value function and can be determined by Bellman equations. For simplicity, we denote the inspection epoch
as decision epoch 1, the inspection epoch
as decision epoch 2, the inspection epoch
as decision epoch 3, the inspection epoch
as decision epoch 4.
Let be the system state at decision epoch 1. According the modeling assumptions, when the system fails at decision epoch 1, i. e.,
, the system will be replaced in entirety that incurs a set-up cost
, the replacement cost
and
. In the case that the system still survives upon the inspection, we can chose the action from
that minimizes the future expected cost. For action
, the cost at the first stage
includes three parts: the cost for replacement in entirety
, the set-up cost after the replacement
, the failure penalty cost over
. After the replacement in entirety, the number of failed components is zero, hence
. For action
, no maintenance action is taken, hence the cost over the first stage
is equal to the failure penalty cost
and
, where
. Therefore,
can be given in a form of the following Bellman equations
where ,
and they are the expected value functions at the next inspection epoch (the decision epoch 2) given the system states after the maintenance actions.
and
are given by
At the decision epoch 2, when the system fails, the system will also be replaced in entirety. Otherwise, an action will be chosen from
such that the future expected cost will be minimized. Given an action
and a system state
, the cost over the second stage
is the sum of the cost for the replacements in component level
, the system set-up cost
and the failure penalty cost over
, denoted by
. Furthermore, if component
is replaced, a cost
will occur, hence
. The set-up cost
will occur if at least one component fails, that is at least
, consequently
To sum up,
where is the expected value function at the next decision epoch (the decision epoch 3), given the system state after the action
, it can be expressed as
By the similar method to that of , for the system state
at the inspection epoch
, the value function can be given by the following Bellman equations
By the similar method to that of , for the system state
at the inspection epoch
,
can be given by the following Bellman equations
where and
are expected value functions at next decision epoch (the decision epoch 5), given the system states after the actions
and
, respectively. They can be obtained by the following expressions
where ,
is the salvage value of the system corresponding to the state
in the end of the fourth stage,
is the depreciation coefficient,
is a decreasing function of the failure component number
.
The maintenance decision process of the four-stage MDP is sketched in Fig 4. The maintenance actions ,
and
,
are chosen from state spaces
and
, respectively, such that the total expected cost over one year
is minimized.
5.2. Algorithm to obtain state transition probability matrix
For the convenience of our statement, the states before and after a transition will be numbered. We denote the state order number before a transition by and that after a transition by
. In the following,
is the component order number.
is the
th row and the
th column element of matrix
.
is a vector of n-dimension whose elements are 0 or 1. According the results in Section 3.2, the following algorithm is proposed to calculate the state transition probability matrix.
5.3. Backward dynamic programming algorithm
To obtain the optimal alternate replacement policy of system level and component level, a backward dynamic programming algorithm will be given in this section. In the algorithm, is the order number of stages,
is the order number of actions,
is the number of elements in set
,
is the temporary variable for the value functions,
is temporary variable for the optimal action. The detail of the algorithm is as follows.
5.4. Properties of the value functions
In this section, the monotonicity of value functions will be discussed [38–40]. Before presenting the main results, some definitions and a theorem on stochastic orders are listed firstly in the following.
Definition 1 For two system states ,
, we denote
if and only if
[41].
Obviously, if , then the degradation states of all components for state
are
worse than those of state . The order is a partial one.
Definition 2 Let be a univariate or a multivariate function with domain in
. If
, whenever
, then we say that the function
is increasing [41].
Definition 3 Let and
be two random vectors such that
for all upper sets
.
Then is said to be smaller than
in the usual stochastic order (denoted by
).
if, and only if,
holds for all increasing functions
for which the expectations exist [41].
Theorem 1 Let the random vector and
have a common copula and
,
. If
, then
[41].
The result of Theorem 1 is significant because it allows the traditional stochastic order problem involving random vectors to be transformed into a simpler problem of univariate random variables, which is more straightforward to verify, provided that they share a common copula. To explore the monotonicity of the value functions presented in this paper, we first outline several key propositions.
Proposition 1 For a system state , the number of failed components
is increasing in
.
Proof: Consider two system states ,
such that
. If
, then
. If
, then
. If
, then
. To sum up, if
, then
. Furthermore,
,
. Hence
is increasing as
increases. ■
According to Proposition 1, as the condition of a system worsens, the number of failed components is likely to increase. This suggests a direct correlation between the deteriorating state of the system and the frequency of component failures. In essence, as the overall integrity of the system declines, the probability of experiencing additional failures rises, highlighting the importance of maintaining system health to prevent a cascade of breakdowns.
Proposition 2 For a maintenance action , the system state after
,
is increasing as
increases.
Proof: Let and
are two system states such that
. For a maintenance action
, if
,
, hence
. For
,
. Thus, if
,
. From definition 1,
■.
Proposition 2 suggests that a deteriorated initial state corresponds to a worse the condition of the system. By the increasing property of the function and Propositions 1 and 2, we can obtain the following corollary.
Corollary 1 For any maintenance action ,
is increasing in
.
Corollary 1 indicates that the expected penalty cost of system failure over an inspection interval increases as the initial state increases. On the monotonicity of value functions, we have the following theorem.
Theorem 2 The value function is increasing as
increases.
Proof: We prove Theorem 2 via the mathematical induction method. Recall that ,
and
is a decreasing function of the failed component number
,
is obviously increasing in
. From proposition 2,
is also increasing in
. Thus
is increasing as
increases.
Suppose that, for ,
is increasing as
increases, we will show that
is increasing in
.
Let denote the system state at stage
after a maintenance action. For two sample of
,
,
such that
, we consider the usual stochastic order of the system states
and
.
and
are the system state increments over time interval
giver the initial values
and
, consequently they have the common copula function
. Suppose that
is the degradation level of component
at stage
, given the state at stage
after a maintenance action, that is
. Similarly, let
. For each
,
Due to the nondecreasing property of ,
. Obviously, for the other
,
also holds. Hence
. From Theorem 1,
. That is
is increasing in
.
According to the increasing property of on
, Proposition 2 and Theorem 1, for a maintenance action
,
is increasing in
. Furthermore, in Bellman Equations 10,13,15,16 of section 4.1,
and
are independent of
and
is also increasing in
. Hence,
is increasing as
increases. ■
The theorem guarantees that a worse system state will lead to a higher cost to go, which enables decision makers to compare and estimate the future costs based on the current observations.
5.5. Comparison models
To illustrate the effectiveness of the alternate replacement strategy with system level and component level proposed in our paper, two comparison models are built in this section.
For comparison model 1, at decision epochs and
, a system replacement will be adopted if the system fails (the number of failed components exceeds the threshold
); Otherwise, nothing is done. At decision epochs
and
, the replacement strategy of component level is the same to that of the original model. Let
be the minimal cost over
for comparison model 1. According the model assumptions, at the decision epoch
, a system replacement will be adopted if the system fails, hence
At the decision epoch , the replacement strategy of component level will be performed, therefore
Similarly, and
can be given by the following equations
For comparison model 2, at all decision epochs, the system will be replaced entirely if it fails; Otherwise, nothing is done. Let be the minimal cost over
for comparison model 2, then they can be obtained recursively by the following equation
where
In comparison model 1, the replacement actions of component level is optimized. In comparison model 2, replacement actions are trigged by system failures. While under the alternate replacement strategy with system level and component level proposed in our paper, the replacement actions of system level and component level are optimized jointly.
6. Numerical studies and analysis
In this section, a numerical example will be given to illustrate the feasibility of the model. Sensitivity analysis of dependence strength and cost-related parameters is also performed.
6.1. Optimal replacement policy
Consider a 3-out-of 4: F system, the parameters of the Gamma degradation increment processes within time interval associated component
are
. They are listed in Table 1.
Suppose that the dependent structure among the degradation increment processes within time interval can be modeled by a Clayton-Copula function as follows
where ,
.
The degradation states of the four components are discretized into three states, named 1, 2, 3. The failure threshold . The costs parameters are shown in Table 2.
The penalty cost of system failures over an inspection interval . The value function for stage 5 is
.
By Algorithm 1 and Matlab software we can calculate the state transition probability matrix . The value functions and the associated alternate replacement strategy can also be obtained by using Algorithm 2. The value functions and the associated alternate replacement strategies under stages 1 and 2 are as shown in Tables 3 and 4.
In stage 1 of the maintenance strategy, if the system fails—defined by having at least three failed components, then the entire system will be replaced. Conversely, if the system is in a functioning state, it is permissible either to replace it entirely or to take no action at all. Table 3 illustrates that for eight specific failed states—namely (1,3,3,3), (2,3,3,3), (3,1,3,3), (3,2,3,3), (3,3,2,3), (3,3,3,1), (3,3,3,2), and (3,3,3,3)—the designated action is to replace the system in its entirety, and notably, the value functions for these states are uniform. This indicates a clear decision to replace the system when it reaches a critical level of failure. For other system states, a clear trend emerges: as the severity of the system state worsens, the likelihood of opting for a complete replacement increases. For example, in the first group, action (0,0,0,0)—representing the decision to take no action—remains prevalent, but its frequency declines in subsequent groups [2 and 3]. This decreasing trend aligns with the decision rules established for the model.
In stage 2, the replacement strategy focuses on component-level interventions. Specifically, if the system experiences a failure—defined as having at least three failed components—a complete replacement of the affected system is mandated. Conversely, if the system is not deemed to have failed, individual components may either be replaced or left untouched based on their condition. Table 4 highlights that for eight failure states, indicated as (1,1,1,1), the prescribed actions and corresponding value functions are identical. This consistency suggests a uniform strategy when the system is in this particular state. However, for other states, a clear trend emerges: the worse the system’s overall condition, the greater the likelihood of implementing component-level replacements. In Group 3 of Table 4, the actions indicate that the frequency of the component replacement action—denoted by 1—occurs more often compared to actions in Group 1.
The increasing property of the value functions is further illustrated by the results presented in Tables 3 and 4, which reveal clear trends in the data. For instance, in Group 1 of Table 3, the value functions for states that are deemed worse than the reference state (1,2,2,1)—specifically, states (1,1,1,1), (1,1,2,1) and (1,2,1,1)—exhibit lower values compared to that of state (1,2,2,1). Conversely, the value functions for states that are superior to state (1,2,2,1) are higher, confirming the expected increasing trend. In contrast, certain states resist straightforward classification, leading to irregularities in the value functions. For example, in Group 3 of Table 4, the value functions for states such as (3,1,2,1), (3,1,3,1), (3,1,3,2), (3,3,1,1), and (3,3,1,2) are all lower than that of state (3,2,1,3). On the other hand, the value functions for states like (3,1,3,3), (3,3,1,3), (3,3,2,2), and (3,3,3,1) are greater than that of state (3,2,1,3).
6.2. Analysis on the comparison results
As for the comparison models, their value functions for stage 1 (the expected costs over one year) under various system states are listed in Table 5. The results of Table 3 and 5 indicate that, for any system states, the alternate replacement strategy is the most economical and the expected cost over one year for comparison 2 is the highest. Furthermore, the costs for the alternate replacement strategy and comparison 1 are nearly the same and the cost difference between the alternate replacement strategy and comparison 2 are relatively higher. Based on the above analysis, under such parameter values in Tables 1 and 2, the advantage of the alternate replacement strategy mainly comes from the dynamic optimization of the replacement actions of component level.
6.3. The effect of system parameters on the value functions
To gain more insights into the model, the effect of some system parameters on the value function is checked in this section. Unless otherwise specified, in the following process of analysis, when one parameter changes, the values of the other parameters remain the same as in 5.1.
Let the parameter for the dependence strength change from 0 to 20 by step 1, the value functions
for the different system states are sketched in Fig 5. The results indicate that: (a) given a system state, the value functions increase as the dependence strength increases. The main reason for the above trend is as follows: when the dependence strength gets greater, the probability that the common-mode deterioration among components will get larger, subsequently, system degradation process will get faster and the probabilities of system or component failures over the four stages will increase. (b) The value functions for the states at an intermediate level are more sensitive to the dependence strength. The trend may be related to the copula function we have chosen. Degradation increment processes with Clayton-Copula are lower tail dependence, that is they are interdependent strongly with each other when their degradation states are relatively good. While the value functions are costs incurred when the system is in worse states, therefore the value functions for better states (state (1,1,1,1), state (1,2,1,3) are almost insensitive to the dependence strength. As for these poor states (state (3,3,3,3), state (2,3,2,3), the interdependency of them is relatively weak. Furthermore, the system can be replaced in entirety and then transit to the best state and every component can be also replaced separately. On account of the above two reasons, the value functions are also almost insensitive to the dependence strength. For the system states at an intermediate level, the interdependency of components is relatively strong and the replacement actions can be chosen with a higher probability, thus, the value functions are more sensitive to the dependence strength.
Let the set-up cost change from 20 to 200 by step 10, the value functions
for the different system states are presented in Fig 6. The results indicate that: (a) when the system is in the best state (1,1,1,1), the set-up cost has almost no impact on the value function. For the state (1,1,1,1), the optimal actions under the four stages are doing nothing and there is no set-up cost, thus the value function is invariant for different set-up costs. (b) For the other states, the worse the system states and the larger the set-up cost, the lager the value functions are. An intuitive explanation is as follows: when the system states get worse, the likelihood of the replacement actions will increase, hence, the value functions are larger and also more sensitive to the changing of the set-up costs.
Let the fundamental failure penalty cost of system failures change from 20 to 200, the value functions for the different penalty cost are sketched in Fig 7. The figure indicates that: (a) when the system is in the best state, the cost has a little impact on the value function. The main reason is that for the best state, the system failure probabilities are very small under the four stages. (b) For the other states, the worse the system states and the larger the penalty cost, the lager the value functions are. The main reason for the above trend is as follows: the worse the system states, the higher probabilities of system failures are and hence the value functions increase as the fundamental penalty costs increase.
7. Conclusions
Based on the operational and maintenance practices for steel rails, this study constructs a k-out-of-n: F system model that incorporates common-mode degradation, alongside a dynamic and alternate replacement strategy addressing both system-level and component-level interventions. The dependence structure of the system is effectively characterized using a copula function, providing a robust framework for analyzing relationships between components. This strategy serves as a valuable guide for the annual maintenance and repair planning of railway lines. To derive the optimal maintenance strategy, a MDP model consisting of four stages is established. The model includes analytical expressions for the discretized state transition matrix pertinent to the multiple-component system, and an algorithm is designed to implement these expressions efficiently. The monotonicity of the value functions is verified through the application of mathematical induction, ensuring the reliability of the model’s outcomes. A numerical example is presented to demonstrate the effectiveness of the proposed model and validate the accuracy of the conclusions drawn. The results of this numerical example indicate that, under the Clayton Copula structure, value functions increase as the dependence strength, set-up cost, and fundamental failure penalty cost rise. Furthermore, it is noteworthy that states positioned at an intermediate level exhibit greater sensitivity to changes in dependence strength.
The common-mode deterioration phenomenon can be found in many fields, such as electronic engineering, mechanical engineering, structural engineering, automotive engineering, aerospace engineering and so on. The modeling method and maintenance strategy proposed in this paper are helpful in reliability assessment and optimizations of maintenance strategies.
For future research, the modeling and optimization of replacement activities for a group of steel rail grids is recommended. For a group of steel rail grids, diverse maintenance decisions, such as group maintenance, opportunity maintenance, are possible and the maintenance decisions of a group of steel rail grids are also practical. Moreover, the modeling and maintenance optimization of steel rail grids with hierarchical stochastic dependency deserve further investigation.
Acknowledgments
The authors would like to thank the editor and the reviewers for their insightful comments and suggestions.
References
- 1. Zhang W, Zhang X, He S, Zhao X, He Z. Optimal condition-based maintenance policy for multi-component repairable systems with economic dependence in a finite-horizon. Reliab Eng Syst Saf. 2024;241:109612.
- 2. Yousefi N, Coit DW, Song S. Reliability analysis of systems considering clusters of dependent degrading components. Reliab Eng Syst Saf. 2020;202:107005.
- 3. Zeng Z, Barros A, Coit D. Dependent failure behavior modeling for risk and reliability: a systematic and critical literature review. Reliab Eng Syst Saf. 2023;239:109515.
- 4. Feng C, Shao L, Wang J, Zhang Y, Wen F. Short-term load forecasting of distribution transformer supply zones based on federated model-agnostic meta learning. IEEE Trans Power Syst. 2025;40(1):31–45.
- 5. Hu Z, Su R, Veerasamy V, Huang L, Ma R. Resilient frequency regulation for microgrids under phasor measurement unit faults and communication intermittency. IEEE Trans Ind Inf. 2025;21(2):1941–9.
- 6. Li H, Zhu W, Dieulle L, Deloux E. Condition-based maintenance strategies for stochastically dependent systems using Nested Lévy copulas. Reliab Eng Syst Saf. 2022;217:108038.
- 7.
Liu R, Bai L, Wang F, Grid . A new theory for high-speed railway infrastructure management. Transportation Research Board 94th Annual Meeting. Washington DC, United States. 2015.
- 8. Bai L, Liu R, Wang F, Sun Q, Wang F. Estimating railway rail service life: a rail-grid-based approach. Transp Res Part A: Policy Practice. 2017;105:54–65.
- 9.
The Railway Ministry of the People’s Republic of China. Rules of railway track maintenance. Railway Transport No. 41. China Railway Publishing House; 2010.
- 10.
The Railway Ministry of the People’s Republic of China. Catalogue of rail defects. TB/T 1778-2010. China Railway Publishing House; 2010.
- 11. Zhang N, Fouladirad M, Barros A. Optimal imperfect maintenance cost analysis of a two-component system with failure interactions. Reliab Eng Syst Saf. 2018;177:24–34.
- 12. Shahraki AF, Yadav OP, Vogiatzis C. Selective maintenance optimization for multi-state systems considering stochastically dependent components and stochastic imperfect maintenance actions. Reliab Eng Syst Saf. 2020;196:106738.
- 13. Dinh D-H, Do P, Iung B. Multi-level opportunistic predictive maintenance for multi-component systems with economic dependence and assembly/disassembly impacts. Reliab Eng Syst Saf. 2022;217:108055.
- 14. Qin S, Wang BX, Tsai T-R, Wang X. The prediction of remaining useful lifetime for the Weibull k-out-of-n load-sharing system. Reliab Eng Syst Saf. 2023;233:109091.
- 15. Tian T, Wang N, Yang J, Miao Z, Li L. Availability evaluation and maintenance optimization of balanced systems considering state-dependent inspection intervals. IEEE Trans Rel. 2025;74(1):2241–54.
- 16. Zheng R, Fang H, Peng Z. Condition-based maintenance for a balanced system considering dependent soft and hard failures. Comput Ind Eng. 2024;197:110550.
- 17. Zhao Y, Cozzani V, Sun T, Vatn J, Liu Y. Condition-based maintenance for a multi-component system subject to heterogeneous failure dependences. Reliab Eng Syst Saf. 2023;239:109483.
- 18. Gao S, Wang J, Zhang J. Reliability analysis of a redundant series system with common cause failures and delayed vacation. Reliab Eng Syst Saf. 2023;239:109467.
- 19. Mercier S, Pham HH. A preventive maintenance policy for a continuously monitored system with correlated wear indicators. Eur J Operat Res. 2012;222(2):263–72.
- 20. Huynh KT, Vu HC, Nguyen TD, Ho AC. A predictive maintenance model for k-out-of-n:F continuously deteriorating systems subject to stochastic and economic dependencies. Reliab Eng Syst Saf. 2022;226:108671.
- 21. Zhao X, Wang Z. Maintenance policies for two-unit balanced systems subject to degradation. IEEE Trans Rel. 2022;71(2):1116–26.
- 22. Xu J, Liang Z, Li Y-F, Wang K. Generalized condition-based maintenance optimization for multi-component systems considering stochastic dependency and imperfect maintenance. Reliab Eng Syst Saf. 2021;211:107592.
- 23. Genest C, Okhrin O, Bodnar T. Copula modeling from Abe Sklar to the present day. J Multivariate Anal. 2024;201:105278.
- 24.
Joe H. Dependence modeling with copulas. Chapman and Hall/CRC. 2014.
- 25. Zheng Y, Zhang Y. Reliability analysis for system with dependent components based on survival signature and copula theory. Reliab Eng Syst Saf. 2023;238:109402.
- 26. Safaei F, Châtelet E, Ahmadi J. Optimal age replacement policy for parallel and series systems with dependent components. Reliab Eng Syst Saf. 2020;197:106798.
- 27. Eryilmaz S, Ozkut M. Optimization problems for a parallel system with multiple types of dependent components. Reliab Eng Syst Saf. 2020;199:106911.
- 28. Torrado N. Optimal component-type allocation and replacement time policies for parallel systems having multi-types dependent components. Reliab Eng Syst Saf. 2022;224:108502.
- 29. Davies K, Dembińska A. On the residual lifetimes of dependent components upon system failure. Reliab Eng Syst Saf. 2024;248:110147.
- 30. Zhao X, Li R, Han H, Qiu Q. Condition-based switching, loading, and age-based maintenance policies for standby systems. Eur J Operat Res. 2025;321(2):565–85.
- 31. Andersen JF, Andersen AR, Kulahci M, Nielsen BF. A numerical study of Markov decision process algorithms for multi-component replacement problems. Eur J Operat Res. 2022;299(3):898–909.
- 32. Xu J, Zhao X, Liu B. A risk-aware maintenance model based on a constrained Markov decision process. IISE Transactions. 2021;54(11):1072–83.
- 33. Li H, Deloux E, Dieulle L. A condition-based maintenance policy for multi-component systems with Lévy copulas dependence. Reliab Eng Syst Saf. 2016;149:44–55.
- 34. Wang J, Qiu Q, Wang H. Joint optimization of condition-based and age-based replacement policy and inventory policy for a two-unit series system. Reliab Eng & Syst Saf. 2021; 205: 107251.
- 35. Mercier S, Meier-Hirmer C, Roussignol M. Bivariate Gamma wear processes for track geometry modelling, with application to intervention scheduling. Struct Infrastruct Eng. 2012;8(4):357–66.
- 36. Qiu Q, Maillart LM, Prokopyev OA, Cui L. Optimal condition-based mission abort decisions. IEEE Trans Rel. 2023;72(1):408–25.
- 37. Qiu Q, Li R, Zhao X. Failure risk management: adaptive performance control and mission abort decisions. Risk Anal. 2025;45(2):421–40. pmid:39108177
- 38. Zhao X, Lv Z, Qiu Q, Guo B. Optimal dynamic condition‐based mode switching policy for systems with main and auxiliary components. Naval Res Logist. 2024.
- 39. Yang L, Wei F, Ma X, Qiu Q. Controlling mission hazards through integrated abort and spare support optimization. Risk Anal. 2025:10.1111/risa.17696. pmid:39755373
- 40. Zhao X, Chen P, Tang LC. Condition-based maintenance via Markov decision processes: a review. Front Eng Manag. 2025.
- 41.
Shaked M, Shanthikumar J. Stochastic orders. New York, NY, USA: Springer; 2007.