A multi-event combination maintenance model based on event correlation

Due to the complexity of large production systems, maintenance events are diverse, simultaneous and dynamic. Appropriate maintenance management of complex large production systems can guarantee high availability and save maintenance costs. However, current maintenance decision-making methods mainly focus on the maintenance events of single-components and series connection multi-components; little research pays attention to the combination maintenance of different maintenance events. Therefore, this paper proposes a multi-event combination maintenance model based on event correlation. First, the maintenance downtime and cost of three types of maintenance events under different maintenance beginning times and degrees are analysed. Then, shared maintenance downtime and cost models are established by maintenance event correlations. In addition, a multi-event combination maintenance model is constructed to achieve the goal of the highest availability and the lowest cost rate in both the decision-making cycle and the remaining life. Moreover, a particle swarm optimization algorithm based on interval segmentation for model solving is designed. Finally, a numerical example is presented to illustrate the model.


Introduction
Maintenance cost of modern production systems occupies a large proportion of the entire cost cycle [1][2][3]. Therefore, the importance of maintenance management is also gradually highlighted. If there is no reasonable maintenance decision, not only will it waste maintenance labour and cost but also it consumes a certain amount of maintenance resources. At the same time, it may produce downtime cost and reduce the effective use time of the production system. Therefore, it is necessary and urgent to formulate reasonable and effective maintenance strategies. Complex large production system has a wide range of coexisting maintenance events. In addition, maintenance events are dynamically updated due to the constant operation of production. Therefore, maintenance events are diverse, simultaneous and dynamic. The proper and effective management of these maintenance events is crucial.
There is much research in this field. At present, the research of maintenance decision modelling is mainly divided into two parts, single-component and multi-component methods.
The maintenance decision-making model for a single-component occurs more frequently, and the method is mature. The study of multi-components is mostly assumed as a single tandem system. However, little research exists on multi-component systems with complex structure.
For a single-component system, there are five relatively mature maintenance decision-making models including the delay time model, the proportional hazard model, the shock model, the LEVY process model and the Markov decision process model. For the delay time model (DTM), it [4] has been widely applied to the modelling and optimization of inspection of the two-stage failure process for single-component with single failure mode [5][6][7][8][9]. For the proportional hazard model, many attempts have been made to relate the failure probability to both historical service life time and condition monitoring variables [10,11]. For the shock model, it has been successfully applied to many subjects, such as physics, communication, electronic engineering and medicine. As a result, a greater number of researchers have become interested in this topic [12][13][14][15][16][17]. For the LEVY process model and the Markov decision process model, the LEVY process model is used to solve the problem of determining condition based maintenance policies [18][19][20]. Single-component maintenance decision models are relatively mature, so it is very effective for the single-component maintenance management problem. However, when a multi-component model is working in a complex large production system, it is inadequate. Therefore, many experts and scholars also conducted in-depth research on multi-component maintenance decision models.
For multi-component systems, there are three relatively mature maintenance decisionmaking methods including group maintenance, bulk maintenance and opportunity maintenance. For group maintenance, dynamic programming models are presented for determining optimal policies for two and three component equipment [21]. R Dekkert et al. developed a methodology to represent the cost-effectiveness of combining activities and to identify an optimal combination plan [22]. For bulk maintenance, D Assaf et al. considered optimum group maintenance policies for a set of N machines subjected to stochastic failures under continuous and periodic inspections [23]. For opportunity maintenance, RE Wildeman et al. proposed a rolling-horizon approach that takes a long-term tentative plan as the basis for subsequent adaptation according to information that becomes available for the short term [24].
The multi-component maintenance decision model has some shortcomings in the management of complex large production system maintenance events. Most studies focus on multicomponent maintenance decisions by assuming that equipment is a whole component or series of connected components [25,26]. However, the actual equipment is a mixed combination of complex production systems including many maintenance events. In addition, most research assumes that the repair degree of the system is to repair to the pre-fault state or repair to an intact state. However, the actual repair process is incompletely repaired. Moreover, there is less consideration for fault retention.
Therefore, this paper proposes a multi-event combination maintenance model based on event correlation for these deficiencies. The model is of great significance to solve the maintenance management of complex large production systems. Combined with the current system health monitoring technology [27,28], real-time decision-making is realized, which can greatly reduce maintenance cost and increase the availability of complex large systems [29][30][31][32].
The structure of this paper is organized as follows. Section 2 presents the related work. Section 3 describes the methodology, including the maintenance downtime cost model, the shared maintenance downtime and cost model, a multi-event combination maintenance model and particle swarm optimization algorithm. Section 4 uses a numerical example to verify the accuracy of the model. Finally, the conclusion and discussion are presented in Section 5.

Related work
To construct the model mentioned in this paper, the following related works are necessary.

1)Degraded Event opportunity maintenance thresholds
Degraded Event refers to the components that degrade during work [5,33,34]. In the degradation process, the opportunity maintenance threshold will be set.
Just as Fig 1 shows, the state change during the degradation of components. m(t), represents the component state. M r (t) represents the risk threshold. When the state of the component reaches this level, the component must be repaired. M w (t) is the pre-warning threshold. When the state of the component reaches this level, component degradation begins. S(t,Z) represents the state change function of the component over time. M r (t) and M w (t) are changeable over time. ΔM is opportunity maintenance interval. T 1 is the pre-warning threshold time. T 2 is the risk threshold time. [T 1 ,T 2 ] is the possible time interval for opportunity maintenance.
2)Timed Event opportunity maintenance thresholds Just as Fig 2 shows, the Timed Event opportunity maintenance threshold is set as (μ,p). The detailed derivation process can be referred to [35]. μ represents the maximum lead time of the Timed Event. P is the specified repair time for the Timed Event. λ represents status of the components. The opportunity maintenance strategies are taken as follows: i. t2[0,μ): Minimum maintenance is conducted if a minor fault is detected and complete maintenance is conducted if a major fault is detected.
ii. t2[μ,P): Complete maintenance is conducted when minor or major faults occur. If the component does not fail at this time, and other maintenance events are detected in the system, then the component and fault component are repaired together.
iii. t = P: This is the specified last repair time.
3) Particle swarm algorithm i. The particle swarm algorithm is a traditional optimization algorithm [36]. The basic steps are as follows: ii. Initialize the particle swarm: set the population size, and randomly generate each particle position and speed.
iii. Construct a fitness function to calculate the fitness of each particle.
iv. According to the fitness function value, the optimal position Gbest of all the particles and the best position Pbest [i] of each particle are obtained.
v. Update speed and location of each particle.
vi. Finally, by constantly updating, obtain the optimal solution.
Considering the ability of particle swarm optimization to search quickly, when we construct a multi-objective optimization maintenance model, we can make some improvements to the algorithm to solve our model. There is a detailed algorithm design process in 3.4.

Methodology
A multi-event combination maintenance model based on event correlation in this paper proposes to dissolve the management problem of maintenance events in complex large systems. For complex large-scale systems, maintenance events can be divided into Fault Event, Degradation Event and Timed Event. Fault Event refers to when components fault occur, they need to be repaired afterwards. For Fault Event, it can be classified into Retentive Fault Event and Non-retentive Fault Event, depending on whether fault can be retained. The condition of fault retention is determined by the impact of the fault itself. Degradation Event refers to the degradation of the component performance and it requires preventive maintenance. Timed Event refers to specified maintenance events due to technical requirements or management system regulations. Therefore, this paper focuses on analysing these three types of maintenance events. Due to the two main quantitative maintenance indicators of complex large systems, maintenance time and cost, this paper uses the availability and maintenance cost rates as decision-making goals.
The process of model construction is shown in Fig 3. Model construction includes the following four parts: the maintenance downtime and cost model, the shared maintenance downtime and cost model, a multi-event combination maintenance model and a particle swarm optimization algorithm. The following subsections will be introduced in turn.

Maintenance downtime and cost model
Due to the continuous operation of the complex large system, the state of the components continuously changes over time. Maintenance downtime and cost are affected by component status. Therefore, it is possible to establish a functional relationship between maintenance downtime as well as cost and maintenance beginning time. In addition, different maintenance degrees have different effects on maintenance downtime and cost.
According to the analysis of related work (Timed Event opportunity maintenance threshold), in the Timed Event opportunity maintenance threshold, since the maintenance time and work contents have been determined in advance, maintenance cost C pi and maintenance downtime T pi are assumed constant. Thus, maintenance downtime and the cost model of Fault Event and Degradation Event are the research focus of this paper.

1) Non-retentive Fault Event
It is assumed that Non-retentive Fault Event is detected on component i, so the maintenance downtime and cost model of component i can be expressed as  It is assumed that a Retentive Fault Event is detected on component i in the opportunity maintenance threshold, and the maintenance downtime and cost model of component i can be expressed as

2) Retentive Fault Event
T fi : The time when a Retentive Fault Event is detected.
The minimum maintenance downtime when a Retentive Fault Event is detected at T fi and is handled at t i .
T max (t i −T fi ): The maximum maintenance downtime when a Retentive Fault Event is detected at T fi and is handled at t i .
C fri (t i ): The maintenance cost when a Retentive Fault Event i is handled at the maintenance time t i . C min (t i −T fi ): The minimum maintenance cost when a Retentive Fault Event is detected at T fi and is handled at The maximum maintenance cost when a Retentive Fault Event is detected at T fi and is handled at t i . δ i : Maintenance degree. The range is (0,1). δ i = 0 means minimum maintenance. δ i = 1 means complete maintenance. 0<δ i <1 means incomplete maintenance.

3) Fault Event maintenance downtime and cost model construction
The Non-retentive Fault Event and Retentive Fault Event maintenance downtime models can be combined together. The Fault Event maintenance downtime and cost model can be expressed as

Degradation event.
According to the analysis of related work (Degraded Event opportunity maintenance threshold) for the Degraded Event opportunity maintenance threshold, the maintenance downtime and cost model of a Degradation Event can be expressed as follows: T di : The time when a Degradation Event is detected. t i : The maintenance beginning time.
The maintenance downtime when Degradation Event i is handled at the maintenance time t i .
The minimum maintenance downtime when a Degradation Event is detected at T di and is handled at t i .
T max (t i −T di ): The maximum maintenance downtime when a Degradation Event is detected at T di and is handled at The maintenance cost when Degradation Fault Event i is handled at the mainte- The minimum maintenance cost when a Degradation Fault Event is detected at T di and is handled at The maximum maintenance cost when a Degradation Fault Event is detected at T di and is handled at t i .

Maintenance event correlation.
To facilitate the combination of the maintenance events, the correlation between maintenance events needs to be analysed. According to engineering experience and expert analysis, at present, maintenance event correlation is generally divided into fault correlation, time correlation, structure correlation and function correlation. The specific meaning of each correlation is shown in Table 1.
The impact of maintenance event correlation is shown in Fig 4. Through the analysis of four correlations, fault correlation will have a certain impact on the system failure rate and affect the overall health of the system. Due to time correlation, the maintenance cost will reduce. Due to structure correlation, it is possible to reduce the operation of the overlapping portion, saving maintenance cost and downtime. Due to function correlation I, maintenance costs will be reduced for shared maintenance resources. Due to function correlation II, maintenance costs and maintenance downtime will be saved.
The shared part is generated when the maintenance event is combined, and the remainder is the own part, as shown in Fig 5. By analysing the characteristics of the event combination, the basic event combination set is established. The own part and the shared part are obtained for the maintenance event combination according to the correlation, laying the foundation for the establishment of a multievent combination maintenance model.

Shared maintenance downtime model.
According to correlation analysis, because of the existence of structure correlation and functional correlation II, the maintenance event Table 1. Four correlation specific meaning.

Correlation Specific meaning
Fault correlation Fault correlation can be divided into three categories: The first correlation means that when a component fails, other component will be affected to fail by different chance. The second correlation means that when a component fails, failure rate of other related component will be affected. The third correlation means that when a component fails, related components will be impacted. If the impact reaches a certain extent, related component will fails.
Time correlation A components need repaired, the other components in the system need to be repaired at the same or near time.

Structure correlation
Structure correlation means that there is a certain overlap in maintenance process between two components. In other words, there are the same maintenance steps during the maintenance process of both components.

Function correlation
Function correlation can be divided into two categories: Function correlation Ⅰrefers to that components can share the same maintenance resources because of similar functions; Function correlation II means that shared parts can be repaired when there are multiple fault components in the system. combination will reduce maintenance downtime. The following assumptions and notations are made: i. There are N 0 (t) components needing repair at time t.
ii. DT 1 ij ðtÞ: shared downtime at time t due to the structural correlation between component i and j.
iii. DT 2 ij ðtÞ: shared downtime at time t due to functional correlation II between component i and j.
The shared maintenance downtime model construction process is as follows: Matrix B i : the correlation between maintenance event i and other maintenance events Shared maintenance downtime between maintenance event i and other maintenance events according to the two correlations.
Therefore, the shared maintenance downtime model can be expressed as Shared maintenance total downtime between maintenance event i and other maintenance events due to two correlations can be expressed as below.
The correlation of two maintenance events is mutual, so according to the above method, all maintenance events shared maintenance downtime is calculated twice. Thus, the shared maintenance downtime for all the maintenance events needs to be halved.
3.2.3. Shared maintenance cost model. According to correlation analysis, because of the existence of time correlation, structural correlation and functional correlation, the maintenance event combination will reduce maintenance costs. The following assumptions and notations are made: i. There are N 0 (t) components needing repair at time t.
ii. C Stop (t): Downtime loss of unit time when a maintenance event is conducted at t.
iii. C Fixed : Fixed shared maintenance cost. Shared maintenance cost according to the time correlation between maintenance event i and maintenance event j can be expressed as ( 2) Shared maintenance cost according to functional correlation Maintenance cost can be shared for shared maintenance resources (function correlation i) and for reducing logistics delay times (function correlation ii). Therefore, shared maintenance cost according to functional correlation can be expressed as 3) Shared maintenance cost according to structural correlation Shared maintenance cost according to structural correlation between maintenance event I and maintenance event j can be expressed as C p : Unit labour cost:

4) Shared maintenance cost model
Matrix A i : The correlation between maintenance event i and other maintenance events. 1ðexisting correlationÞ ( C i 3�N 0 represents the shared maintenance cost between maintenance event i and other maintenance events according to the three correlations Therefore, the shared maintenance cost model can be expressed as The shared maintenance total cost between maintenance event i and other maintenance events due to the three correlations can be expressed as The correlation of two maintenance events is mutual; according to the above method all maintenance events shared maintenance cost is calculated twice, so the shared maintenance cost for all the maintenance events needs to be halved.

Notation and assumptions.
i. There are n components in the system numbered from 1 to n in order.
ii. (λ di , λ fi ): The failure rate threshold of opportunity maintenance for component i. xvii. n 1 +1~n 1 + n 2 : The serial number of the Degradation Event.

Model construction.
Suppose there are three dummy variables, ω fi , ω di , and ω pi .
o pi , N 0 = n 1 +n 2 +n 3 The combination matrix when maintenance events are arbitrarily divided into i blocks is as follows: The kth sub-combination of the jth combination: Δc jk : The shared maintenance cost of the kth sub-combination of the jth combination when N 0 maintenance events are arbitrarily divided into i blocks The shared maintenance cost of all the maintenance events is expressed as follows: The maintenance cost after combination maintenance can be expressed as Complete maintenance restores components to the initial condition; the minimum maintenance restores the failure rate back to the moment before the fault. Incomplete maintenance makes components return to a state before repair.
The maintenance cost in the remaining life period considering the impact of repair degree on the failure rate and the number of maintenance of remaining life cycle can be expressed as The mean value of the remaining life cycle of the system is The maintenance cost rate after combination maintenance can be expressed as The shared maintenance downtime matrix is Δt jk : The shared maintenance downtime of the kth sub-combination of the jth combination when N 0 maintenance events are arbitrarily divided into i blocks.
The shared maintenance downtime of all the maintenance events is expressed as follows: The maintenance downtime in the remaining life period considering the impact of repair degree on the failure rate and the number of maintenance of remaining life cycle can be expressed as Availability after combination maintenance can be expressed as A multi-event combination maintenance model is expressed as follows to achieve the goal of the highest availability and the lowest cost rate in not only the decision-making cycle but also the remaining life:

Particle swarm optimization algorithm
According to the model features, a particle swarm optimization algorithm based on interval segmentation is designed. Algorithm flow is as shown in Fig 6. 3.4.1. Interval segmentation.
i. According to the health status and maintenance plan of the system, the number of different types of events is counted as N.
ii. Obtain event set A = {1,2, . . ., i, . . . N}; iii. The set A is divided into separate  i. Initialize the particle swarm: set the population size to 50.
ii. According to the characteristics of the objective function, to achieve two goals of the lowest maintenance cost rate and the maximum availability, the function of the maintenance cost rate and the availability function are calculated as the respective fitness functions.
iii. According to 2.3, the particle swarm algorithm, the optimal solution is obtained.

Numerical example
In this numerical example, we examine the model developed earlier and assess the validity of its development. From the model notation, a large number of opportunity maintenance intervals, maintenance costs and downtime are carefully considered. In particular, the relationship between some of the cost and downtime parameters must be reasonably specified. We start with some basic model parameters, which need careful consideration. There are 4 components in the system, and the component numbers are 1, 2, 3, and 4. The system life is 50,000 hours. The last trouble-free working hours of components 1, 2, 3, and 4 were 400, 500, 600, and 300 hours in system records, respectively. t s = 1000 is the decision-making start time. We assume the maintenance cost and downtime as follows: i. The minimum maintenance cost function for component 1 is C 1 min ðtÞ ¼ 1 8 t þ 100. The complete maintenance cost function is C 1 max ðtÞ ¼ 1 2 t 2 þ 200. The corresponding maintenance cost function with the retention time is ii. The minimum maintenance cost for component 2 is 100, and the complete maintenance cost is 300; the corresponding maintenance cost function with retention time is iii. The minimum maintenance cost for component 3 with degradation time is C 3 min ðtÞ ¼ 1 9 t þ 100. The complete maintenance cost is C 3 max ðtÞ ¼ 1 9 t 2 þ 200. The corresponding maintenance cost function with the degradation time is iv. The maintenance cost function of Timed Event 4 due to preventive maintenance in advance is C p4 (t) = 30.
v. The minimum maintenance downtime for component 1 with retention time is T 1 min ðtÞ ¼ 1 100 t þ 0:2. The complete maintenance downtime is T 1 max ðtÞ ¼ 1 50 t þ 0:4. The corresponding maintenance downtime for the retention time is vi. The minimum maintenance downtime of component 2 is 0.2 and the complete maintenance downtime is 0.5; the corresponding maintenance downtime with retention time is vii. The minimum maintenance downtime of Degradation Event 3 is T 3 min ðtÞ ¼ 1 50 t þ 0:1. The complete maintenance downtime is T 3 max ðtÞ ¼ 1 25 t þ 0:2. The corresponding maintenance downtime for the degradation time is T di ðtÞ ¼ ð 1 50 t þ 0:1Þ � ð1 þ d 3 Þ. viii. Timed Event 4 maintenance downtime is 0.15, and the repair degree is 0.8.
ix. The shared maintenance cost function for component 1 and component 2 due to structure correlation with retention time is x. The shared maintenance cost function for components 3 and 4 due to function correlation with degradation time is According to the multi-event combination maintenance model and its algorithm, 50 particles are selected and iteratively run 100 times; the optimization results are obtained. By controlling the maintenance cost and time when the failure of 3 components occurred in the remaining life cycle, the results listed in Table 2, Table 3 and Table 4 can be obtained.
Assuming that the maintenance costs were 200, 250, 200, and 100 and the maintenance times were 0.3, 0.45, 0.4, and 0.25 when the failure of components 1, 2, 3, and 4 occurred in the remaining life cycle, respectively, the results in Table 2 can be obtained.  Assuming that the maintenance costs were 200, 250, 400, and 100 and maintenance times were 0.3, 0.45, 1.2, and 0.25 when the failure of components 1, 2, 3, and 4 occurred in the remaining life cycle, respectively, the results in Table 3 can be obtained.
Assuming that the maintenance costs were 200, 250, 20000, and 100 and maintenance times were 0.3, 0.45, 4, and 0.25 when the failure of components 1, 2, 3, and 4 occurred in the remaining life cycle, respectively, the results in Table 4 can be obtained.
According to the results in Table 2, Table 3 and Table 4, since the range of the objective function is not limited, some combinations have negative availability. These results are contrary to reality and should be removed. Through comprehensive analysis of the optimization results of the three tables, when the maintenance cost and downtime of component 3 are different, different maintenance degrees have different effects. When the follow-up maintenance cost and downtime are small, the minimum maintenance is more economical; when the follow-up maintenance costs and maintenance downtime are median values, it is economical to determine incomplete maintenance downtime; when the follow-up maintenance costs and maintenance downtime are large, complete maintenance is more economical. This is consistent with actual engineering experience. Of course, the most appropriate maintenance beginning time is also extremely important. The maintenance beginning time in the table is the best time to conduct maintenance, while the cost and downtime generated are also the least.
Removing the combination with negative availability in the tables, the results in Fig 7, Fig 8  and Fig 9 can be obtained.
As seen in Fig 7, Fig 8 and Fig 9, the combinations shown are available maintenance combinations. Among them, combination 15 is the traditional maintenance mode. The remaining combination is the optimized combination. From the perspective of availability, combination  8 achieves the highest availability; however, the availability of each combination is relatively close. From the perspective of cost rate, combination 7 achieves the lowest cost rate. Compared with other combinations, the reduction in the cost rate is more obvious. Therefore, through comprehensive consideration of availability and cost rate factors, combination 7 is the best maintenance combination. Therefore, the decisions at Event 1 and Event 2 are combined to begin maintenance at t = 1000; Event 3 and Event 4 are combined to begin maintenance at t = 1015. Considering the assumption, Event 1 and Event 2 have structural correlation while Event 3 and Event 4 have functional correlation, so the decision-making results meet the assumption, which prove that this decision-making model is accurate.

Conclusion and discussion
The model method in this paper is used to solve the problem of maintenance event management in complex large-scale production systems. Aiming at the diversity, simultaneity and dynamics of maintenance events, a multi-event combination maintenance model is constructed to achieve the goal of the highest availability and the lowest cost rate in the decisionmaking cycle and the remaining life of the system. The combination of maintenance events makes maintenance more scientific and standardized.
The new contributions of this paper are as follows: i. The maintenance correlation is summarized into four categories, based on which of the correlations, shared maintenance downtimes and cost models are constructed.
ii. For traditional single decision-making variables and single decision-making objectives of the maintenance decision-making model method, the maintenance combination of different events, repair times and degrees are considered as optimization variables. The multievent combination maintenance model is constructed to achieve the goal of the highest availability and the lowest cost rate in the decision-making cycle and the remaining life of the system.
In this paper, we assume that the maintenance cost function and the maintenance downtime function are linear functions of the maintenance degree. However, in actually, maintenance cost and the downtime function have complex function forms. Therefore, in the future, maintenance cost and downtime functions of different equipment under different maintenance beginning times and degrees need more research for accurate decision-making.