Figures
Abstract
The compact arrangement of chemical storage tanks significantly increases the occurrence probability of domino effect accidents. The accident chain length, a critical parameter for assessing accident severity, enables rapid comprehension of potential accident impacts and serves as a foundation for constructing accident scenarios in domino effect risk assessment. This study centers on domino effect accidents within chemical storage tanks and conducting a detailed analysis of factors influencing the accident chain length. Given the limitations in historical statistical data and quantitative risk evaluations, an intelligent prediction method is developed to forecast the accident chain length. A fully connected feedforward neural network (FC-FNN) is utilized to analyze 255 pertinent accident cases spanning from 1970 to 2024, with key features such as the type of substances implicated and the operating conditions during accidents being judiciously screened. To compensate for the insufficiency of data regarding the volume of storage tanks, a small-scale augmentation is implemented within the tolerable error range. Additionally, Shapley Additive Explanations (SHAP) is applied to optimize the feature set, reducing the number of features from 15 to 10 based on their contribution to the model’s predictions. The results show that the combined application of feature selection, data augmentation, and SHAP-based optimization significantly improves the model’s prediction performance. The test set prediction accuracy exceeds 0.978, demonstrating the effectiveness of the proposed approach.
Citation: Qi J, Zhang M, Yu G, Bo C (2025) Analysis and intelligent prediction of domino effect accidents in chemical storage tanks with a focus on accident chain length. PLoS One 20(9): e0331180. https://doi.org/10.1371/journal.pone.0331180
Editor: Guojin Qin, Southwest Petroleum University, CHINA
Received: May 30, 2025; Accepted: August 12, 2025; Published: September 2, 2025
Copyright: © 2025 Qi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data used in this study are derived from historical accident cases collected in the public domain, and their contents have been included in the manuscript and its supporting information files. Specifically, the dataset comprises 255 chemical storage tank accident cases (1970–2024), which are also available in the Figshare repository under the DOI: 10.6084/m9.figshare.29488916.
Funding: The funders of this study [Mingguang Zhang, Jiangsu Provincial Department of Science and Technology (BE2023809)] &[Cuimei Bo, Innovative Research Group Project of the National Natural Science Foundation of China (62333010).]had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
1. Introduction
Benefiting from significant improvement in scale economy effects, industrial chain integration, and environmental advantages, chemical industry parks (CIPs) have emerged rapidly worldwide [1]. However, this development pattern has introduced safety challenges, as high-density clustering of chemical plants and the substantial hazards they pose have undeniably increased the potential risks of safety accidents, such as fires and explosions [1,2]. In the event of an accident, its impact can rapidly propagate to adjacent plants or enterprises, triggering a domino effect that results in a series of accidents [3]. Chemical storage tanks, as critical vulnerability points of the chemical industry, often store flammable and explosive substances, making them susceptible to accident propagation throughout industrial complexes [3,4].
Reviewing historical accidents involving hazardous chemicals, it becomes evident that domino effect accidents are quite prevalent [3]. Accident chains, as crucial factors in identifying domino effect accidents, are essential for predicting accident trends and guiding accident prevention and control measures [5]. Text mining can be utilized to extract accident chains, thereby enhancing the accuracy of accident identification and prediction [6].To elucidate the characteristics of domino effects, some scholars have conducted statistical analyses on domino accidents. However, such research is limited, primarily due to the insufficient number of accident cases for statistical purposes or incomplete accident information, making it difficult to ascertain whether a domino effect is involved [7]. Darbra et al. collected 225 domino effect accidents and summarized the areas most prone to domino effects and the most frequent accident sequences [8]. Hemmatian B et al. analyzed 330 historical accidents and compared them with Darbra’s statistical results, finding that production areas (38.5%) and storage tank areas (33%) remained the most common locations for accidents, with a ratio of 6 between primary and secondary extended accidents [9]. Guohua Chen summarized the statistical analysis results of numerous scholars on accident cases and discovered that secondary accidents are the most numerous in domino effect accident chains [10]. In “two-step” domino effect accidents, the most significant types are “fire→explosion” and “explosion→fire”. In “three-step” domino effect accidents, the most important types are “fire→explosion→fire” and “explosion→fire→explosion”. Zhang et al. analyzed 165 domino effect accidents worldwide and extracted key accident nodes as “fire→fire”, “fire→explosion”, “explosion→fire”, and “explosion→explosion” [4]. Liang et al. conducted a statistical analysis of 49 domino effect accidents involving atmospheric storage tanks and found that tank explosion accidents accounted for 55.1% [11]. Despite these insights, the lack of granular data on materials and equipment limits the applicability of statistical results. This gap has spurred interest in dynamic models capable of predicting, rather than merely describing, accident propagation [12–15].
In view of the limitations of traditional research in capturing the characteristics of domino – effect accidents and their practical applications, as well as the pressing need for constructing dynamic accident models, the increasing complexity of industrial systems in recent years has led to the emergence of research on system resilience within the field of domino effect studies [16,17]. The resilience theory emphasizes that when confronted with disturbances and shocks, a system can not only maintain its basic functions but also recover rapidly and achieve adaptive adjustments [18]. The introduction of this theory aims to address the deficiencies in traditional research regarding the dynamic evolution of accidents and the analysis of system response capabilities. In the context of chemical industrial parks, system resilience is closely related to the prevention and control of domino effects. A highly resilient system can effectively prevent the spread of domino effects and reduce the extension of accident chains when accidents occur [17,19]. This echoes the research on domino – effect accident chains and provides new ideas for controlling accident development and reducing accident losses. In recent years, some scholars have, from the perspective of system resilience, attempted to enhance the ability of chemical industrial parks to withstand domino – effect impacts by means of optimizing park layout, strengthening equipment redundancy design, and establishing dynamic risk early – warning mechanisms [20]. These practical explorations essentially aim to seek more forward – looking and proactive risk prevention and control strategies based on existing accident models and statistical analyses. While resilience theory provides a conceptual framework, its quantitative application to domino effects remains underdeveloped, necessitating data-driven approaches like machine learning [21]. This further highlights the necessity of this study to explore new methods and develop more effective prediction models.
Conventional statistical methods are inadequate for characterizing domino effects and fail to satisfy the stringent requirements of modern chemical process safety. Domino accidents in industrial settings typically cause exponentially greater losses than isolated incidents. Consequently, significant research efforts focus on domino effect risk identification, assessment, and management [22–29]. All of these efforts involve the identification and construction of Domino Effect accident scenarios. The formulation of Domino Effect emergency response plans and the dispatch of emergency resources rely on an accurate grasp of the accident scenarios [30–32]. The chain length of Domino Effect accidents, a parameter influenced by multiple factors such as the types of substances involved, the accident area, and the accident’s propagation mode, can effectively reflect the overall severity of the accident. It provides a basis for identifying key nodes in accident scenario construction and serves Domino Effect risk assessment and emergency management.
With the development of machine learning theories, computer hardware, and programming languages, the application of machine learning in the chemical industry has become increasingly widespread, to extract information, identify patterns, and make predictions [33]. Many scholars have combined data with deep learning algorithms to conduct predictions on target objects, achieving relatively high prediction accuracy and resolving the issue that traditional static data has difficulty in accommodating the dynamic changes of data [34–36]. Currently, using historical data to predict the development trend of accidents has become a research hotspot. The fully connected feedforward neural network demonstrates distinct advantages in terms of prediction accuracy, generalization ability, etc., effectively making up for the deficiencies of traditional empirical models [37–39]. The statistical data of chemical storage tank accidents are characterized by a large number of technical terms and strong data correlations, and the fully connected architecture of FC-FNN can precisely match these characteristics. When dealing with complex textual data, the word embedding technique can better capture semantic information and provide the hidden semantic relationships between words [40–42]. SHapley Additive exPlanations (SHAP), when combined with machine learning models, can quantify the contribution of each feature to model predictions. As a result, SHAP values have emerged as a key metric for evaluating feature importance in predictive models, enabling feature optimization and enhancing model interpretability [43,44]. Given the multifaceted challenges in domino effect accident research and the urgency for precise prediction and efficient management, along with the distinctive merits of advanced technologies like the fully connected feedforward neural network and SHAP values, this study aims to utilize these techniques comprehensively. It will deeply analyze the chain length of domino effect accidents and build a more accurate and effective prediction model, expecting to strongly support the risk assessment, emergency management, and accident scenario construction of such accidents, thereby better addressing this critical practical issue.
Specifically, an exhaustive collection of the data regarding chemical storage tank domino accidents since 2024 was carried out, and a fully connected feedforward neural network (FC-FNN) [45], employing word embedding for text classification was constructed to intelligently prognosticate the chain length of the domino effect accident chain in chemical storage tanks. In the feature selection phase, a statistical analysis of 255 historical accidents was performed for feature dissection, and the crucial influencing factors related to chain length were chosen as feature inputs. To verify the rationality of feature selection, SHAP values were introduced. By quantifying the contribution of each feature to the model’s prediction results, the feature selection process was further optimized. Moreover, data augmentation techniques were applied to guarantee the precision of the prediction model, thereby providing a novel approach for the construction of accident scenarios for this category of accidents.
The subsequent structure of this paper is as follows: Section 2 elaborates in detail on the basic concepts of domino effect accidents and the length of their accident chains, and also introduces that we have introduced a feedforward neural network based on word embedding for the statistical characteristics of domino effect accident data. Moreover, to identify highly relevant features, we introduce the SHAP theory. Section 3 explains the algorithm we designed and its implementation process. Section 4 clarifies the analysis results of domino effect accidents in chemical storage tanks by comparing with existing literature and discusses the experimental results of the intelligent prediction of the length of their accident chains. The conclusion is drawn in Section 5.
2. Related work
Firstly, this section defines domino effect accident and the chain length. Then, a Word Embedding-based FC-FNN tailored to the statistical characteristics of domino effect accident data is introduced. Finally, the SHAP theory is incorporated to enhance the model’s interpretability and conduct feature importance analysis. These steps establish the theoretical framework for understanding the predictive modeling approach of domino effects and the interpretability framework subsequently employed in this study.
2.1. Definition of domino effect accident
As a high-impact low-probability (HILP) accident scenario, domino effect accidents have garnered growing public concern [46]. Characterized by their inherent complexity, precisely defining these accidents poses a formidable challenge [47]. To date, the academic community has yet to reach a consensus on a standardized definition. Although there is much controversy, in the chemical industry, the term “domino effect” refers to a series of accidents, in which the main accident is usually a fire or explosion, which triggers further accidents and comprehensively escalates the consequences of the event [48].
Currently, the most widely – accepted definition of ‘domino effect accident’ stems from the description of the requisite conditions for the existence of the domino effect put forward by Cozzani [49]: ①An initial accidental scenario exists, which initiates a domino accident; ②The physical expansion consequence of the initial event leads to the aggravation of the accident, meaning that the propagation of the initial accident must lead to the malfunction of at least one secondary equipment unit; ③At least one secondary accidental scenario occurs; ④There is an accident expansion effect, i.e., the overall severity of the domino accident surpasses that of the initial accident.
In the subsequent research of this paper, the following terminologies will be employed:
- (1) Initial accident: The first accident triggered by accidental factors.
- (2) Accident expansion: The process whereby an accident in one storage tank induces an accident in another adjacent storage tank. This process can be the transmission of the same type of accident or the transmission of different types of accidents.
2.2. Definition of chemical storage tank domino effect accident chain
According to the disaster system theory, disasters are the result of the interaction of hazard-inducing factors, disaster-pregnant environments, and disaster-bearing bodies [50]. In chemical industrial parks, hazard-inducing factors are primarily categorized into natural disaster – related and technological disaster – related factors. The latter includes domino effect escalation factors, namely fire heat radiation, explosion shock waves, and explosion fragments. In this study, the disaster-bearing body is the tank farm. During the expansion of the accident chain, it predominantly spreads through dangerous equipment units that can trigger fire and explosion incidents upon failure. The chain affects equipment, personnel, and the environment via hazard-inducing factors such as fire heat radiation, explosion shock waves, and explosion fragments. The disaster-pregnant environment encompasses the natural and humanistic environments within and outside the park, as well as the interrelationships among disaster-bearing bodies, including meteorological conditions, personnel distribution, management factors, and park layout. The severity of disaster consequences is jointly determined by the hazard of hazard-inducing factors and the vulnerability of disaster-bearing bodies.
In the evolution of a domino effect accident, the length of the accident chain is defined as the number of accident propagation events from the initial accident facility to the end of a series of subsequent chain reactions. The initial accident facility marks the starting point, while the final impacted facility denotes the endpoint of the accident chain. This length is determined by calculating the number of times the accident propagates between facilities, rather than simply counting the total number of affected facilities. For instance, if an accident spreads from Facility A to B, and then from B to C, the length of the accident chain is 2 (A → B → C, with two propagation events). When the accident spreads simultaneously from A to B and A to C, the chain length is 1 (since A → B and A → C are parallel propagation paths, with a single propagation event). In a scenario where the accident spreads from A to B, then from B to C, and concurrently from A to D, the chain length is 2, as the longest propagation path (A → B → C) involves two propagation events. The accident chain length is intricately linked to the hazard of hazard-inducing factors and the vulnerability of the affected carrier. Consequently, it is influenced by multiple factors, including the types of substances involved, the accident location, and the mode of accident expansion. The accident chain length serves as an effective indicator of the severity of such disaster consequences. It facilitates the quantitative risk assessment of such accidents by pinpointing risk sources and high-risk areas for scenario construction, and also aids in identifying the origin and critical links of accidents.
2.3. Word embedding-based feedforward neural network
Accurate intelligent prediction of domino effect accidents in chemical storage tanks necessitates statistical analysis of existing relevant cases. Since the statistical items are numerous and all key information extracted from textual data, we elected to establish a fully-connected feedforward neural network based on word embeddings for text classification. This neural network architecture exhibits robust representational learning capabilities. It can transform input data through layered linear and nonlinear operations, gradually converting them into more abstract and hierarchical feature representations. By doing so, it effectively captures the inherent structures and patterns within the input data, thereby significantly enhancing the prediction accuracy for domino effect accidents in chemical storage tanks.
To process textual data effectively, this paper utilizes Word2Vec [51] embeddings to transform the selected features from text data into vectors. This approach encodes the semantic relationships between words, providing meaningful inputs for the neural network and enabling efficient feature extraction and classification. Each word embedding serves as a feature input, which undergoes nonlinear transformation in the hidden layers using the Rectified Linear Unit (ReLU) activation function. Neurons in each layer are connected to the previous layer through weights, facilitating the gradual extraction and representation of abstract features within the input text. At the output layer, the softmax function calculates the probability distribution across different categories, determining the likelihood of the text belonging to each class.
The fully connected layer plays a crucial role in performing both linear and nonlinear transformations on the input data for feature extraction and transformation. Its implementation is described by equations 1.1 and 1.2.
Wherein, represents the output of the linear transformation, which is the weighted sum of the inputs plus a bias term.
is the weight matrix, where each element represents the weight of a connection between an input feature and a neuron.
is the input vector to the layer, with each element representing an input feature.
is the bias term, added to the weighted sum to allow the model to better fit the data by shifting the activation function.
Wherein, is the output of the ReLU activation function.
represents the Rectified Linear Unit activation function, which is defined as
. It introduces non-linearity to the model by outputting the input directly if it is positive; otherwise, it outputs zero.
is the input to the ReLU function, which is the output from the linear transformation in equation 1.1. During the training process of a neural network, Dropout is employed to randomly discard a portion of neurons, enabling the network to learn more robust and generalized features while reducing co-adaptation between neurons. equation 1.3 specifically represents the implementation of the Dropout layer.
Wherein, is the input to the Dropout layer,
is the output from the Dropout layer,
is the dropout probability,
and
is a randomly generated binary mask of the same shape as, where 0 indicates a neuron that is dropped out, and 1 indicates a neuron that is retained.
2.4. SHAP value theory
To improve model interpretability, we incorporated SHAP values for feature selection. Rooted in Shapley values from game theory, SHAP values enable a fair apportionment of each feature’s contribution to the model’s prediction outcomes. We employed SHAP values to analyze the feature importance of the Word Embedding-based Feedforward Neural Network and selected the most influential features for model retraining.
For the model prediction result ,the SHAP value
is defined as:
Where F is the set of all features, S is a feature subset, and is the model prediction result based on the feature subset S.
3. Proposed method
In this section, we will prepare for developing an intelligent prediction model for the chain length of domino effect accident chains in chemical storage tanks by collecting the historical accident data of the domino effect in chemical storage tanks and analyzing to identify the influencing factors related to the length of the accident chain. In the establishment of the model, we will first outline the proposed algorithm framework and then elaborate on the aspects of feature selection and data augmentation in detail.
3.1. Framework
We collected data on domino effect accidents involving atmospheric and pressurized storage tanks from 1970 to 2024, both in China and internationally. These data were then summarized and statistically analyzed. Among the evolutionary characteristics of domino effect accidents, the chain length of the accident chain is a critical parameter used to characterize the overall severity of the accident. To conduct a comprehensive analysis, we consider a wide range of accident statistical characteristics, including but not limited to: the category of the initial accident device, the name of the initial accident device, the form of the storage tank, the tank capacity, the tank material, the initial accident medium, the pressure state, the type of initial medium, the type of initial accident, the secondary accident unit, the chain length of the accident, the status of the unit at the time of the accident, and the location of the initial accident.
The framework of this study is structured as follows:
- Data Preprocessing: The raw data is preprocessed through feature selection and data augmentation to ensure its suitability for modeling.
- Feature Embedding: The preprocessed text data is converted into vector representations using the Word2Vec embedding technique, enabling the use of machine learning models.
- Chain Length Prediction: A fully connected feedforward neural network is trained on the embedded data to predict the chain length of domino effect accidents.
- Feature Optimization with SHAP: Identify the most influential features and enhance the interpretability of the model.
Algorithm 1. Intelligent Prediction Model for Domino Effect Accident Chain Length
Data collection and analysis of Domino Effect Accidents in Chemical Storage Tanks
Input: Summary Data of Domino Effect Accidents
//Feature Selection:
1: Extract Input Features & Corresponding Chain Lengths
//Data Augmentation:
1: if Tank Capacity Present
2: Extract Unit + Value
3:Numeric Value× random (0.99 ~ 1.01);
4: end if Blank Tank Capacity
5: Duplicate this row of data;
//Model Training and Prediction:
1:Feature Matrix→Word2Vec→Numeric Vector
2:Numeric Vector+Corresponding Chain Length→Feedforward Neural Network
//SHAP-based Feature Optimization:
1: Apply SHAP to the trained FC-FNN model
2: Calculate Shapley values for each feature to quantify its contribution to the model’s predictions
3: Rank features based on SHAP values to identify the most influential ones
4: Refine the feature set by retaining only the top influential features
5: Retrain FC-FNN with the optimized feature set for improved prediction accuracy
Output: Chain Length and Feature Importance Analysis
3.2. Data collection and analysis of Domino Effect Accidents in Chemical Storage Tanks
This paper collects 255 domino effect accident cases involving atmospheric and pressurized storage tanks and their auxiliary pipelines, sourced from Chinese and international petrochemical enterprises spanning 1970–2024 (with all cases recorded through the end of 2024). These cases serve as analytical samples to characterize tank-related domino accidents. Specifically, 143 cases are identified as atmospheric storage tank domino accidents, and 95 as pressurized storage tank domino accidents.
To demonstrate the development trend of the number of storage tank accidents in China, a five-year period is adopted for analyzing the quantitative trend of domino accidents involving both atmospheric and pressurized storage tanks. As shown in Fig 1, the number of accidents generally exhibits a trend of first increasing and then decreasing. Specifically, it shows an upward trend from 1970 to 2010 and begins to decline after 2010. In terms of the proportion of domino accidents involving atmospheric storage tanks, the number of such accidents accounts for more than 50% of the total.
3.2.1. Substance type analysis.
Table 1 summarizes 143 domino accidents involving 31 types of hazardous materials in atmospheric storage tanks. Among liquid substances, gasoline (21.7%) accounts for the highest proportion of causing atmospheric storage tank accidents. Among explosive volatile gases, oil evaporation gas (18.8%), hydrocarbon-containing evaporation gas (18.8%), and benzene-containing evaporation gas (18.8%) are the top three contributors to atmospheric storage tank accidents.
Table 2 summarizes 95 initial material accidents in pressurized storage tank areas, involving 15 types of substances, which are categorized into liquefied gases (60%) and compressed gases (40%). Among liquefied gases, liquefied petroleum gas (LPG) accounts for the highest probability as the initial material, at 29.5%. Within compressed gases, explosive mixed gases (combinations of CH4, H2, etc.) have the highest probability of being the initial material, at 14.7%.
3.2.2. Operational status during accident occurrence and cause analysis.
Studying the operational status during accident occurrence is highly effective in identifying high-risk operations and reducing the probability of accidents. According to statistical findings, the probability of accidents occurring under normal working conditions is 40.4% for atmospheric storage tanks, while it is 15.4% during inspection and maintenance. For pressurized storage tanks, the probability of accidents occurring under normal working conditions is 56%, and 19% during inspection and maintenance.
The analysis of accident causes is crucial for the prevention of domino accidents. Based on the accident causation theory of chain reaction and incorporating the categories of accident causes described in MHIDAS, a classification analysis of the contributing factors to the accidents under study has been conducted. Human factors are identified as the primary cause of accidents (50%), followed by equipment factors (27.9%) and environmental factors (24.3%). Among human factors, violation of work regulations (20.6%) accounts for the highest proportion, closely followed by improper operation (19.9%).
3.2.3. Abbreviated analysis of accident chain evolution & propagation.
The most prevalent accident chain in atmospheric storage tanks involves gasoline, with the common modes being “Leak-Explosion-Fire→Fire” and “Leak-Fire→Fire”. Crude oil follows, primarily causing “Leak-Fire→Fire”. Explosive volatile gases often lead to “Explosion→Fire”. Leakage is the most frequent initial event triggering domino effects in atmospheric tank farms. As shown in Table 3, the most common spread mode among adjacent tanks is “Fire→Fire” (59, 39.9%), followed by “Explosion→ Fire” (36, 24.3%) and “Fire→Explosion” (35, 23.6%).
Given that the propagation of accident chains in tank farms is not endless, when human control measures are implemented, the accident chain will terminate at a certain point without affecting all tanks in the tank farm. To study the transmission length of accident chains, the average chain length of accidents is defined as the sum of the transmission lengths of all accident chains divided by the number of accidents. Here, the length of an accident chain refers to the number of times an accident expands; if an accident expands once, the length of the accident chain is 1. According to the analysis in Table 4, the average chain length of the 143 accidents is 1.17. This indicates that, in most cases, the domino effect of accidents in atmospheric storage tank farms terminates after affecting two tanks due to human emergency intervention. In a minority of cases, initial emergency response delays can lead to the expansion of the domino effect, triggering accidents in multiple tanks in the tank farm. The analysis of the average accident chain length of the domino effect in tank farms provides basic data for the construction of accident chain evolution and expansion scenarios in atmospheric tank farms.
In pressure tank accidents, leaks account for the highest proportion (80%) of initial incidents. Among these, secondary first-order accidents occur at a rate of 30.5%, with “Leak-Explosion” being the most frequent chain. For secondary second-order accidents (48.4%), the most prevalent chain is “Leak-Explosion-Fire”. As for secondary third-order accidents (21.1%), the most common chain is “Leak-Dispersion-Explosion-Fire”. As shown in Table 5, the most common propagation mode among adjacent tanks in pressure tank farms is “Explosion → Fire” (40, 34.8%), followed by “Leak → Explosion” (30, 26.1%) and “Dispersion → Explosion” (22, 19.1%).
Calculations based on Table 6 reveal that the average extension length of accident chains in pressure tank farms is 1.91, indicating that once an accident occurs in a pressure tank farm, it is highly likely to lead to secondary second-order accidents.
3.3. Feature selection
In the feature selection stage, we first conducted an analysis based on domain knowledge and historical data, and selected 15 features that might be related to the length of the accident chain. These features include weather conditions, air temperature, alarm response speed, the category of the initial accident device, the name of the initial accident device (named in the form of “medium + storage tank; process + device; medium + device”), the form of the storage tank, the tank capacity, the tank material, the initial accident medium, whether it is under normal pressure or pressurized, the type of the initial medium, the type of the initial accident, the secondary accident device, the status of the device at the time of the accident, and the location of the initial accident. To verify the rationality of these features, SHAP values were used to calculate the contribution of each feature and conduct an assessment.
The 15 identified features are sequentially labeled as Feature 01 to Feature 15, and SHAP is applied to evaluate their importance. Fig 2 presents an aggregated SHAP-based feature importance plot, highlighting the top 10 critical features for predicting accident chain length. Notably, the initial accident medium (Feature 09) exhibits the highest SHAP value, indicating its dominant role in propagating domino effects, which aligns with statistical findings that over 80% of domino accidents involve flammable/explosive substances. Second in importance is the storage tank pressure condition (Feature 10), as pressurized tanks are more prone to secondary accident escalation, leading to accident chains ≥2 in length. The equipment operational state during the accident (Feature 14) ranks third, with most incidents occurring under normal operating conditions, primarily due to human error (e.g., non-compliant operations). Subsequent key features include initial accident type (Feature 12), storage tank volume (Feature 07), initial medium classification (Feature 11), primary accident equipment identifier (Feature 05), secondary accident equipment (Feature 13), primary equipment category (Feature 04), and accident location (Feature 15).
SHAP values analysis reveals that certain features contribute very little to the prediction of accident chain length. For example, the alarm response speed and storage tank material have low correlations due to limited data availability in accident reports. Additionally, features such as weather conditions, storage tank types, and air temperature have low SHAP value contributions and are therefore excluded from the final feature set. In summary, we have selected 10 high-correlation features for subsequent model training and prediction.
3.4. Data augmentation
To mitigate the issue of limited data, we partition the dataset into two subsets based on whether the storage tank capacity feature is missing or not. For the subset with non-missing tank capacity data, data augmentation is performed to enhance the sample size. While this augmentation may alter the statistical occurrence probability of such accidents, it does not compromise the prediction accuracy of domino effect accident chain lengths involving chemical storage tanks. According to the literature in Spherical and Large-Scale Storage Tanks [52], there are four fundamental concepts of tank capacity: calculated capacity, nominal capacity, actual capacity (storage capacity), and operating capacity. The actual capacity (storage capacity) refers to the maximum volume that a tank can physically hold. In accident investigation reports, the tank capacity mentioned typically corresponds to the actual capacity, signifying the maximum liquid volume the tank is designed and specified to contain. This capacity is determined by the tank’s design and specifications. To augment the tank capacity feature, we consider the permissible dimensional errors specified in relevant standards. According to GB 50341 Design Code for Vertical Cylindrical Steel Welded Storage Tanks [53], the allowable manufacturing error for storage tank capacity is ± 1%. Therefore, data augmentation is performed on the tank capacity feature within the range of 0.99 to 1.01 times the original value, ensuring compliance with industrial standards while expanding the dataset. The augmented data is then incorporated into the training set of our neural network model to ensure prediction accuracy.
From a fairness perspective, we also perform a replication operation on the non-missing tank capacity data to address potential biases arising from data imbalance in machine learning. This approach ensures that the model is trained on a balanced dataset, thereby improving its generalization ability and reducing the risk of overfitting to the majority class.
4. Experimental results and analysis
4.1. Statistical feature result analysis
Through comparisons with historical literature, Table 7 reveals five distinct characteristics of domino accidents in chemical storage tanks: flammable/explosive substances like liquefied petroleum gas, gasoline, and crude oil remain the primary initial accident materials consistent with prior studies; tank farms (48%) and production areas (18%) emerge as high-risk zones aligning with broader chemical industry trends; dominant expansion sequences include “fire→explosion,” “explosion→fire,” and “fire→fire” showing no significant deviations from traditional accident statistics; while atmospheric storage tank accidents still primarily involve single expansions (78%), pressure storage tank incidents exhibit a notably higher rate of secondary/higher expansions (69.5%) surpassing benchmark values. Critically differing from historical datasets, leakage is identified as the preponderant initial event particularly in pressure storage tank accidents (80% incidence), highlighting unique risk profiles for pressurized systems.
4.2. Intelligent prediction result analysis
4.2.1. Model performance evaluation: Accuracy metric.
The accuracy of the experimental data is obtained by comparing the accident chain length predicted by the model with the actual accident chain length. Specifically, we use historical accident data to train a feedforward neural network model. On the test set, the accident chain length predicted by the model is compared with the actual accident chain length, and then the accuracy is calculated. The actual accident chain length is extracted from the historical accident data. In this way, we are able to evaluate the performance of the model under different data processing methods.
4.2.2. Comparative analysis of data processing methods.
Table 8 presents the simulation results of training the neural network using the original data, data with added feature selection, data with augmented data, data with both added feature selection and data augmentation, and data with SHAP-based feature optimization.
Based on the simulation results, it can be observed that when no processing is applied to the statistical experimental samples, the prediction accuracy of the model is relatively low. After augmenting the volume of data within a small range, the accuracy improves; however, due to the large number of input features, the model exhibits signs of overfitting. When only feature selection is performed on the data, the accuracy of the model remains low due to insufficient data volume. By augmenting the data and simultaneously performing feature selection, the accuracy of the model improves significantly, with the prediction accuracy for accident chain length exceeding 0.95. However, as the augmentation and replication quantities increase, the model’s accuracy decreases.
To further enhance the model’s performance and interpretability, we applied SHAP for feature optimization. By analyzing the contribution of each feature using SHAP, we reduced the feature set from 15 to 10, retaining only the most influential features. This optimization not only improved the model’s accuracy but also mitigated overfitting. Specifically, the training set accuracy increased from 0.97289 to 0.97853, and the test set accuracy improved from 0.97254 to 0.97821. The SHAP-based feature optimization also enhanced the model’s interpretability by providing clear insights into the importance of each feature, enabling a more focused analysis of the factors driving domino effect accidents.
4.2.3. Model robustness verification.
To verify the stability of the model and eliminate the bias caused by random division, 5-fold cross-validation was used to evaluate the optimized FC-FNN model. The dataset was randomly divided into 5 subsets. Each time, 4 subsets were used for training and 1 subset for testing, and the process was repeated 5 times. The results showed that the average accuracy of the test set was 0.977 ± 0.002, with the maximum difference between folds being 0.6% (Table 9), indicating that the model can maintain high prediction accuracy under different data distributions and effectively avoid result bias caused by accidental factors.
Therefore, in future risk predictions of domino effect accidents, the prediction model can be utilized to determine the length of accident chains, providing a more precise construction of accident scenarios for quantitative risk assessments. This enables a better understanding of the development process and impact scope of accidents, thereby facilitating the adoption of more effective prevention and response measures.
5. Conclusion
This study explored domino effect accidents in chemical storage tanks, with a focus on predicting accident chain length. By analyzing 255 historical accidents from 1970 to 2024 and applying advanced machine learning techniques, several significant conclusions were drawn.
- 1) The chain length of domino effect accidents was precisely defined, and the dataset was categorized into atmospheric and pressurized tank accidents. The analysis revealed that pressurized storage tanks had a 69.5% secondary/higher-order accident rate, which far exceeded the 16.1% rate of atmospheric tanks. Leakage, especially in pressurized tanks (80%), was identified as the most common initiating event.
- 2) A fully connected feedforward neural network (FC-FNN) integrated with word embedding was developed, and input features were optimized using SHAP analysis. To address data scarcity, systematic data augmentation was implemented, significantly improving the model’s robustness. The optimized “Data with SHAP Optimization (10 Features)” model achieved an accuracy exceeding 0.978 in predicting accident chain lengths, outperforming baseline models (training/test set improvements: 0.97289 → 0.97853/ 0.97254 → 0.97821).
- 3) The integration of SHAP-based feature selection with an FC-FNN model establishes an interpretable framework that identifies critical factors driving accident propagation in chemical storage tanks, such as substance type and operational status. Statistical analysis further reveals that pressurized tanks are highly susceptible to multi-stage accident expansions, underscoring the urgency for targeted safety measures to mitigate domino effect risks.
To further enhance the domino effect prediction framework’s reliability and applicability, future work will focus on key aspects: Deploy IoT sensor networks to integrate real-time storage tank operational data with historical records, reducing reliance on static datasets; Use hybrid models (fusing data-driven and physics-informed networks) with embedded mechanical equations to constrain predictions by physical laws, cutting dependence on historical data; Pre – train models on public industrial datasets and fine – tune with tank – specific data to ease small – sample validation issues.
References
- 1. Zeng T, Chen G, Reniers G, Men J. Developing a barrier management framework for dealing with Natech domino effects and increasing chemical cluster resilience. Process Safe Environ Protect. 2022;168:778–91.
- 2. Zeng T, Wei L, Reniers G, Chen G. A comprehensive study for probability prediction of domino effects considering synergistic effects. Reliab Eng Syst Safe. 2024;251:110318.
- 3. Khan FI, Abbasi SA. An assessment of the likelihood of occurrence, and the damage potential of domino effect (chain of accidents) in a typical cluster of industries. J Loss Prev Process Ind. 2001;14(4):283–306.
- 4. Zhang M, Zheng F, Chen F, Pan W, Mo S. Propagation probability of domino effect based on analysis of accident chain in storage tank area. J Loss Prev Process Ind. 2019;62:103962.
- 5. Feng JR, Yu G, Zhao M, Zhang J, Lu S. Dynamic risk assessment framework for industrial systems based on accidents chain theory: The case study of fire and explosion risk of UHV converter transformer. Reliab Eng Syst Safe. 2022;228:108760.
- 6. Li J, Yang Z, He H, Guo C, Chen Y, Zhang Y. Risk causation analysis and prevention strategy of working fluid systems based on accident data and complex network theory. Reliab Eng Syst Safe. 2024;252:110445.
- 7. Hou L, Wu X, Wu Z, Wu S. Pattern identification and risk prediction of domino effect based on data mining methods for accidents occurred in the tank farm. Reliab Eng Syst Safe. 2020;193:106646.
- 8. Darbra RM, Palacios A, Casal J. Domino effect in chemical accidents: main features and accident sequences. J Hazard Mater. 2010;183(1–3):565–73. pmid:20709447
- 9. Hemmatian B, Abdolhamidzadeh B, Darbra RM, Casal J. The significance of domino effect in chemical accidents. J Loss Prev Process Ind. 2014;29:30–8.
- 10. Chen G, An T, Chen P. Review on historical analysis of domino effect in chemical accidents. J Safe Sci Tech. 2015;11(04):64–70.
- 11. Liang C, Zhang M, Zhu J, Zuo Y, Yang J, Cui X. Escalation probabilistic model of atmospheric tank under coupling effect of thermal radiation and blast wave in domino accidents. J Loss Prev Process Ind. 2022;80:104888.
- 12. Wang S, Zhang Y, Piao X, Lin X, Hu Y, Yin B. Data-unbalanced traffic accident prediction via adaptive graph and self-supervised learning. Appl Soft Comput. 2024;157:111512.
- 13. Khakzad N, Khan F, Amyotte P, Cozzani V. Risk management of domino effects considering dynamic consequence analysis. Risk Anal. 2014;34(6):1128–38. pmid:24382306
- 14. Khakzad N. Application of dynamic Bayesian network to risk analysis of domino effects in chemical infrastructures. Reliab Eng Syst Safe. 2015;138:263–72.
- 15. Huang C, Zhang C, Dai P, Bo L, editors. Deep dynamic fusion network for traffic accident forecasting. Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 2019.
- 16. Folke C, Carpenter SR, Walker B, Scheffer M, Chapin T, Rockström J. Resilience Thinking Integrating resilience, adaptability and transformability. Ecol Soc. 2010;15(4).
- 17.
Reniers G, Cozzani V. Domino effects in the process industries: modelling, prevention and managing. Newnes; 2013.
- 18. Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliab Eng Syst Safe. 2015;141:5–9.
- 19.
Chen C, Reniers G, Yang M, Chen C, Reniers G, Yang M. A resilience-based approach for the prevention and mitigation of domino effects. Integrating safety and security management to protect chemical industrial areas from domino effects. 2022. p. 155–76.
- 20. Sun H, Wang H, Yang M, Reniers G. Resilience-based approach to safety barrier performance assessment in process facilities. J Loss Prev Process Ind. 2021;73:104599.
- 21. Ab Rahim MS, Reniers G, Yang M, Bajpai S. Risk assessment methods for process safety, process security and resilience in the chemical process industry: A thorough literature review. J Loss Prev Process Ind. 2024;88:105274.
- 22. Gholamizadeh K, Zarei E, Yazdi M, Ramezanifar E, Aliabadi MM. A hybrid model for dynamic analysis of domino effects in chemical process industries. Reliab Eng Syst Safe. 2024;241:109654.
- 23. Amin MdT, Scarponi GE, Cozzani V, Khan F. Dynamic Domino Effect Assessment (D2EA) in tank farms using a machine learning-based approach. Comput Chem Eng. 2024;181:108556.
- 24. Su M, Wei L, Zhou S, Yang G, Wang R, Duo Y, et al. Study on Dynamic Probability and Quantitative Risk Calculation Method of Domino Accident in Pool Fire in Chemical Storage Tank Area. Int J Environ Res Public Health. 2022;19(24):16483. pmid:36554371
- 25. Li X, Chen G, Amyotte P, Khan F, Alauddin M. Vulnerability assessment of storage tanks exposed to simultaneous fire and explosion hazards. Reliab Eng Syst Safe. 2023;230:108960.
- 26. Zhang Q, Wu J, Bai Y, Zhang C, Wang J, Qin T. Agent-based risk modeling of domino effects in urban LNG stations. J Loss Prev Process Ind. 2024;89:105300.
- 27. Li Y, Yu L, Jing Q. Dynamic risk assessment method for urban hydrogen refueling stations: A novel dynamic Bayesian network incorporating multiple equipment states and accident cascade effects. Int J Hydrogen Energy. 2024;54:1367–85.
- 28. He Z, Shen K, Lan M, Weng W. The effects of dynamic multi-hazard risk assessment on evacuation strategies in chemical accidents. Int J Hydrogen Energy. 2024;246:110044.
- 29. Tugnoli A, Scarponi GE, Antonioni G, Cozzani V. Quantitative assessment of domino effect and escalation scenarios caused by fragment projection. Int J Hydrogen Energy. 2022;217:108059.
- 30. Chen G, An T, Chen P. Estimation model of emergency resource demands for chemical accidents involving domino effect. J Safe Sci Tech. 2015;25(04):87–93.
- 31. Zhou J, Reniers G, Khakzad N. Application of event sequence diagram to evaluate emergency response actions during fire-induced domino effects. Int J Hydrogen Energy. 2016;150:202–9.
- 32. Gomes JO, Borges MR, Huber GJ, Carvalho PVR. Analysis of the resilience of team performance during a nuclear emergency response exercise. Appl Ergon. 2014;45(3):780–8.
- 33. Schweidtmann AM, Esche E, Fischer A, Kloft M, Repke J, Sager S, et al. Machine Learning in Chemical Engineering: A Perspective. Chem Ingen Tech. 2021;93(12):2029–39.
- 34. Che J, Hu K, Xia W, Xu Y, Li Y. Short-term air quality prediction using point and interval deep learning systems coupled with multi-factor decomposition and data-driven tree compression. Appl Soft Comput. 2024;166:112191.
- 35. Niu Y, Shuai B, Zhang R, Fa H, Huang W. Short-term inbound passenger flow prediction at high-speed railway stations considering the departure passenger arrival pattern. Appl Soft Comput. 2024;166:112219.
- 36. Yang Y, Zhang J, Wang L. Semi-supervised prediction method for time series based on Monte Carlo and time fusion feature attention. Appl Soft Comput. 2024;167:112283.
- 37. Popoola SI, Adetiba E, Atayero AA, Faruk N, Calafate CT. Optimal model for path loss predictions using feed-forward neural networks. Cogent Eng. 2018;5(1):1444345.
- 38. Belmahdi B, Louzazni M, Akour M, Cotfas DT, Cotfas PA, El Bouardi A. Long-Term Global Solar Radiation Prediction in 25 Cities in Morocco Using the FFNN-BP Method. Front Energy Res. 2021;9.
- 39. Kothona D, Panapakidis IP, Christoforidis GC. A novel hybrid ensemble LSTM‐FFNN forecasting model for very short‐term and short‐term PV generation forecasting. IET Renewable Power Gen. 2021;16(1):3–18.
- 40. Selva Birunda S, Kanniga Devi R. A review on word embedding techniques for text classification. Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020. 2021. p. 267–81.
- 41. Mikolov T. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013;3781.
- 42. Goldberg Y. word2vec Explained: deriving Mikolov et al.‘s negative-sampling word-embedding method. arXiv preprint arXiv:14023722. 2014.
- 43. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
- 44.
Marcílio WE, Eler DM, editors. From explanations to feature selection: assessing SHAP values as feature selection mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE; 2020.
- 45. Liu Z, Zhang X, Han L, Zhou G, Miao Y. Fault classification of helicopter main reducer based on fully connected neural network. J Beijing Inf Sci Technol Univ. 2023;38(04):61–6.
- 46. Necci A, Cozzani V, Spadoni G, Khan F. Assessment of domino effect: State of the art and research Needs. Reliab Eng Syst Safe. 2015;143:3–18.
- 47. Khan F, Amin MT, Cozzani V, Reniers G. Domino effect: Its prediction and prevention—An overview. Method Chem Process Safe. 2021;5:1–35.
- 48. Li J, Reniers G, Cozzani V, Khan F. A bibliometric analysis of peer-reviewed publications on domino effects in the process industry. J Loss Prev Process Ind. 2017;49:103–10.
- 49. Cozzani V, Gubinelli G, Salzano E. Escalation thresholds in the assessment of domino accidental events. J Hazard Mater. 2006;129(1–3):1–21. pmid:16159694
- 50. Yang P, Huang X, Peng L, Zheng Z, Wu X, Xing C. Safety evaluation of major hazard installations based on regional disaster system theory. J Loss Prev Process Ind. 2021;69:104346.
- 51. Xie Q. Analysis of several influencing factors on Word2Vec text classification effect. Mod Inf Tech. 2024;8(01):125–9.
- 52. Liu F. Determination of volume and diameter and height of large storage tanks. Chem Eng Mach. 2019;46(03):316–8.
- 53.
GB 50341. Design Code for Vertical Cylindrical Steel Welded Storage Tanks. Ministry of Housing and Urban-Rural Development of the People’s Republic of China. 2014.