Figures
Abstract
Stockout risk is a persistent challenge in supply chain management, undermining both operational efficiency and customer satisfaction. This study adopts a multi-method approach to investigate the causal effect of lead time on stockout risk by integrating causal inference techniques with predictive analytics. The proposed framework combines Propensity Score Matching (PSM), Instrumental Variables (IV-2SLS), Inverse Probability Weighting (IPW), and Doubly Robust Estimation (DRE) alongside machine learning (ML) algorithms and time series forecasting. Using a dataset of 20,000 supply chain incidents, the study estimates the Average Treatment Effect (ATE) and evaluates predictive model performance. PSM generated the most credible ATE (0.882), confirming a strong causal link between lead time and stockout risk. IV analysis using supplier distance as an instrument yielded a reduced and statistically insignificant ATE (0.5535, p = 0.3148), suggesting instrument weakness. Among ML models, Random Forest and LightGBM achieved superior predictive accuracy (R2 = 0.25; MSE = 0.736), while Moving Average forecasting effectively captured stockout patterns over time (R2 = 0.883). The findings identify PSM as the most robust technique for causal inference. This study advances the literature by integrating causal inference, ML, and time series methods, offering practical, data-driven insights to strengthen operational resilience and guide proactive inventory management.
Citation: Hafaf E, Alzubi AB, Iyiola K, Aljuhmani HY (2026) A simulation-based hybrid causal predictive framework for stockout risk analysis in supply chain. PLoS One 21(6): e0350429. https://doi.org/10.1371/journal.pone.0350429
Editor: Erfan Babaee Tirkolaee, Istinye University: Istinye Universitesi, TÜRKIYE
Received: May 15, 2025; Accepted: May 13, 2026; Published: June 11, 2026
Copyright: © 2026 Hafaf et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Dataset used in this study cannot be shared due to its confidentiality restrictions. However, the code implemented in the methodology for data preprocessing and prediction analysis are available upon reasonable request from enstitu@akun.edu.tr. This institutional email is managed independently of the authors and will remain available to handle data access inquiries, ensuring continuity and long-term availability. The dataset will also be securely stored within our institution’s archival system to guarantee persistent access over time.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
In the evolving world, Supply chain management has adopted more sophisticated approaches to reduce risks and improve operational services [1]. Notably, Data analysis and Machine learning (ML) have been used to enhance this such as in, supplier risk evaluation, stock optimization, and demand forecasting. Hence, by comparing these different databases to find trends and anticipate threats, these methods have shown promise in improving supply chain resilience [2]. Therefore, methods for causal ML have been proposed for creating supply chain risk management response [3]. Recent studies have increasingly concentrated on applying novel methods to predict and minimize the risks related to late deliveries and stock outs.
Furthermore, Artificial intelligence (AI) and ML are sophisticated intelligent tools transforming the supply chain management sector, by allowing for data-driven decision-making, optimizing processes, and improving performance across the supply chain network [4]. With a rapid increase of data from numerous sources, such as Internet of Things (IoT) devices, enterprise resource planning (ERP) systems, transportation systems, and customer demand information, AI and ML approaches have increasingly become vital in handling the multifaceted nature of contemporary supply chains [5]. These tools aid in demand prediction, where predictive algorithms use previous sales data, seasonal trends, and other variables like economic patterns and atmospheric conditions to effectively estimate projected demand [6]. This enables businesses to optimize inventory levels, prevent stockouts, and avoid overstock situations, which can lead to higher carrying costs. AI-driven inventory optimization allows for tracking and supply, decreasing waste and improving supply chain efficiency [7]. Fig 1 illustrates the bibliometric report on the paper review on Scopus. Over 22 published articles were identified using keywords such as supply chains, ML, prediction and causal inference between 2024 and 2025. The bibliometric visualization was done using the VOS viewer application as shown in Fig 1.
AI-influenced forecasting and other risk factors conditions enable different businesses to identify and curtail possible supply chains issues before they occur. Thus, this process improves procurement, supplier risk control, logistics control through applying techniques like the ML, Deep Learning and reinforcement learning [8].
Additionally, a more flexible, robust and cost-effective supply chain that adapts to changing in both demand and supply factors are integrating AI in analysis and reduces risks and boosts customer satisfaction [9]. Another important use of AI in supply chains is risk management, particularly in managing stockout problems. ML algorithms can analyze vast datasets to identify trends and predict crises prior to the occurrence, allowing businesses to take preemptive risk management measures. Businesses that use AI-driven analytics may be able to establish a more responsive, flexible, and cost-effective supply chain that reacts to changing demand and supply scenarios, reducing uncertainty and improving customer satisfaction [10].
Comparatively, Causal inference is an imperative analytical tool utilized in supply chain that assists in decision-making to distinguish between correlation and causation for evaluation of variables which influences the stockouts risk [11]. Comparatively, traditional statistical methods and ML models often focus on the accuracy in prediction but fail to discover the fundamental causal relationships between supply chain components. For example, while an algorithm could demonstrate how longer lead times are related with increased stockout risk, it cannot demonstrate that decreasing lead time will really reduce stockouts [12]. Causal inference has grown increasingly significant in supply chain management because it allows organizations to go over correlation-based observations and discover actual cause-and-effect linkages. Unlike conventional predictive analysis, which focusses on projecting future trends, causal inference approaches can assess the direct influence of variables such as lead time, supplier disruptions, price strategies, and demand changes on stockout risk and final supply chain performance [13].
To address these research gaps and achieve a comprehensive understanding of stockout risk, this study is guided by the following key objectives:
- To develop a hybrid decision-layered framework that combines causal inference, predictive modeling and temporal analysis within a unified architecture for stockout risk assessment.
- To validate the integrated framework in a simulation-based environment, enabling controlled methodological evaluation of causal and predictive components.
- To evaluate intervention relevance through estimation of the causal effect of the lead time on stockout risk using established causal inference methods under explicit identification assumptions.
- To support operational forecasting by implementing ML methods for short-term prediction of stockout risk based on multivariate supply chain features.
2. Related studies
Modern studies have focused on risk assessment and mitigation in supply chains, using a novel analytical tool such as ML, causal inference and Causal ML, and time series forecasting. Therefore, this section combines previous work into three major topics and identifies specific gaps that drive our harmonious, multi-method framework.
2.1. Predictive ML approach in supply chains
Predictive modelling in supply networks is an important analytical strategy that utilizes ML and sophisticated mathematical methods to forecast future supply chain behaviors, optimize resource allocation, and reduce risks associated with unpredictable demand and logistical uncertainty.
[14] adopted deep learning algorithms in macroeconomic statistics to predict late deliveries in the automobile industry, emphasizing the importance of external economic problems on supply chain efficiency. Additionally, in a predictive study by [4] that evaluated the efficacy of ML techniques for forecasting customer backorders to enhance the efficiency in supply chain. The study focusses on the difficulty of balancing accurate models and computational effectiveness in complicated supply chain contexts. By comparing the approaches generated with 22 variables to only the best five, the authors showed that simplified ML algorithms can dramatically cut computational expenses by 30% to 98%, while maintaining good accuracy with 0.6% to 4.2% decrease in F1-score. Applying accessible datasets, the study demonstrates the practical benefit of simplified prediction algorithms for real-time inventory and backorder management, indicating their suitability for wider industrial deployment.
In a similar study, [15] employed ML and AI approaches to forecast delays in truck delivery operations. Their research tackles transportation management issues by predicting delays, allowing management to more accurately plan and deploy resources. Practical logistical data was analyzed using a variety of ML algorithms, such as decision trees, random forests, Artificial Neural Networks (ANN), and support vector machines. The study revealed that each of these algorithms predicted delays with up to 97.6% accuracy, with the Adaboost classifier beating all of them in both accuracy and precision. This study illustrates the possibility of ML to improve logistics efficiency and reduce supply chain delays [2,15] examined the application of ML approaches to prediction in supply chain networks. The study addresses the constraints of traditional maintenance techniques by using ML methods to detect equipment failures, reduce unexpected downtime, and improve supply chain efficiency. By analyzing external variables, sensor inputs, and historical data, the research reveals how predictive analysis may detect possible issues before they cause disruptions. The study compares multiple ML models designed for maintenance prediction, providing useful insights into how ML may optimize supply chain operations and increase efficiency through proactive maintenance tactics [15].
A comprehensive study by [16] reported the development of a ML-based predictive and alerting system to improve practical decision-making system in a computerized warehouse setting. Their algorithm forecasts identification of late order using information gathered by Warehouse Management Systems (WMS), allowing for action through the designed alerting setting. Similarly, a tyre distribution company adopted this technology to validate a Shuttle-Based Storing and Recovery system. The approach successfully adopted some approaches by evaluating several ML models and customizing predictions for warehouse-specific factors. Their study focuses on the practical advantages of ML for improving short-term operational effectiveness in logistics [16].
In a recent study by [17] shows how hybridization and improved integration of models could enhance prediction. The study presented a priority-based architecture integrating space-time analysis and also a crash severity technique to assess pedestrian and bicyclist safety. Their results show the imperativeness of hybridization in prediction and how it influences outcome.
2.2. Causal inference and causal ML in supply chains
Causal inference methods have advanced beyond just understanding the relationships between variables to identify causal frameworks, but estimate treatment effects, forecast counterfactual events, and guide decision-making procedures in complicated real-world systems. Studies by [3] made a proposal for the implementation of causal ML to create intervention techniques, emphasizing the importance of understanding relationships between variables in order to improve decision-making. Similarly, [18] presented a novel method integrating causal discoveries with reinforcement learning to ascribe the fundamental causes of supply risks, tackling the complications of interdependence in supply chains. In the medical supply chains sector, [19] also reported two-stage methodologies integrating both ML and a Unified Robust Stochastic Programming (URSP) architecture to improve the survivability of medical supply chains (MSCs) during disease outbreaks globally. The first phase employs an ML algorithm to forecast infection levels and change demand projections accurately. To handle constraints under unpredictability, on the other hand, the second stage employs URSP, which combines stochastic programming and robust optimization with a configurable strength level, as well as Conditional Value-at-Risk (CVaR). This case study in Turkey showed that the integrative strategy outperformed previous methods by providing more cost-effective and robust allocation options. The study emphasizes the significance of combining predictive modelling and robust decision-making techniques to overcome substantial ambiguity within the healthcare supply chain [19].
While causal techniques can help with hypothetical intervention planning and identifying supply chain risk factors, new research shows that these methods are not enough to back up operational decision support in its entirety, since they cannot reliably predict the short term or analyze temporal patterns. In their discussion of the shortcomings of models that attempt to infer causation alone, [20] highlight the necessity for hybrid analytical techniques that also incorporate predictive performance and planning skills.
2.3. Hybridization of intelligent computational systems
The adoption of multiple intelligent mathematical models in supply chains as an advanced approach that integrates various computational frameworks to address various supply chains decision problems resulting in significantly robust and responsive supply chain systems. Many research studies have reported the hybridization [21] of these multiple models in addressing supply chains, such as [5] who assessed the application of intelligent computational methods such as IoT, ML, AI, and blockchain to improve accuracy and openness in smart manufacturing supply chains. The study focusses on two application cases, demand-side forecasting and buyer.
Furthermore, the increasing complexities in supply chain networks have resulted to better reliance on predictive models and causal inferences to minimize stockout risk. Recent studies have shown a various way to improve risk management, known causal ML, reinforcement learning and Bayesian networks [22]. Causal inference has grown increasingly significant in supply chain management because it allows organizations to go over correlation-based observations and discover actual cause-and-effect linkages. Unlike conventional predictive analysis, which focusses on projecting future trends, causal inference approaches can assess the direct influence of variables such as lead time, supplier disruptions, price strategies, and demand changes on stockout risk and final supply chain performance [23]. Table 1 shows the summary of some related studies that adopted various ML techniques.
Despite these advances in the various studies reported, a significant gap remains in merging causal inference approaches with predictive modelling to thoroughly assess and predict stockout risks. While ML algorithms have showed promise for detecting possible disruptions, they frequently fail to identify a causal relationship between significant factors like lead time and stockout risk. This constraint impedes the creation of efficient intervention techniques for reducing stockout incidents.
Our study addresses a research gap by combining causal inference and predictive modelling to understand and forecast stockout threats. While previous research focusses on either ML for forecasting or optimization approaches within certain contexts, it falls short of identifying the causal associations between major risks in the supply chain, like lead time consequences on stockouts.
2.4. Methodological contribution and framework synthesis
While many articles have applied numerous methodological frameworks such as, ML, econometric or time series techniques to supply chain risk issues, these approaches are consistently adopted in isolation and target a single analytical objective. Predictive models focus mainly on forecasting accuracy, time series approaches are focused on analyzing temporal behavior and causal inference focuses on intervention effects. However, the operational decision-making necessitates the coherent knowledge on potential outcomes such as, when risk is likely to occur, what drives disruptions and how risk evolves with time. Therefore, lack of structured integration of these factors draw the practical value of current analytical techniques.
This study, contributes methodologically by proposing a simulation-Based Hybrid causal-predictive framework that establishes the integrated functionality of predictive modeling, causal inference and temporal analysis into a single decision-oriented architecture. Hence, the contribution does not involve proposing new estimators, but in defining a structured synthesis that includes each analytical process serving a distinct and adjunct decision role.
3. Materials and methods
3.1. Research methodology
The study utilizes a simulation-based methodological analysis design to evaluate a hybrid decision-layered framework for stockout risk analysis. The methodological framework integrates causal inference, predictive modeling and temporal analysis within a structured architecture to intervene on relevance, operational forecasting and temporal monitoring.
Furthermore, the methodological objective focuses mainly on the validation of the proposed hybrid framework under controlled and structured conditions. Therefore, a synthetic dataset representing various supply chain operations based on similar studies, was generated to mimic realistic relationships within standard supply chains criteria such as, inventory levels, logistic factors, demand, stockout risk and supplier reliability. The simulation environment allows evaluation of analytical components characterized by well-defined dependencies and reproducibility. Data preprocessing steps such as Data cleaning, Data splitting and feature Engineering similar to [24] were adopted. Traditional ML models, causal inference techniques and time series were evaluated based on their evaluation metrics as shown in Fig 2.
The analytical elements that comprise the framework are assessed using different criteria since they perform diverse decision-making roles. The interpretability, strength across estimators, and conceptual validity of causal inference approaches are evaluated in relation to their ability to estimate intervention impact and their adherence to identification principles. When evaluating predictive models, statistical performance indicators such as MAE, MSE, RMSE, and R2 are used to gauge the accuracy of the forecasts. How well time-series approaches reflect the time frame and align their forecasts are two key metrics for evaluating them.
3.2. Implementation and platform
The experiment was carried out on Python (v3.9) using scikit-learn, causal ML, and other additional libraries. The analysis was carried out using python programming language. The Exploratory Data Analysis technique was adopted using normalization approach similar to study [25] where the dataset was standardized using the Minmaxscaler approach. Additionally, categorical dataset “supplier_type” was converted to numerical dataset using the OneHotEncorder approach.
3.3. Data description
The study adopted a synthetically generated supply chain dataset, structured to approximate real-world distribution network dynamics [26]. The dataset was generated using a parameterized probabilistic distribution, that reflects operational ranges in supply chains studies. The dataset contains 19 essential variables with 20,000 instances, relevant to supply chain operations. The variables include, order processing, inventory management, supplier reliability, logistics, and stockout risk factors. It provides integer-type variables like order_id, lead_time, inventory_level, demand, order_size, and warehouse_capacity to measure supply chain activities. Float-type characteristics such as supplier_reliability, stockout_risk, seasonality_factor, economic_index, warehouse_utilization, and market_demand_trend reflect ongoing risks and economic impacts. Additionally, the supplier_type feature represents categorical data. These characteristics together form a solid foundation for analyzing stockout risk and optimizing supply chain robustness through causal inference, machine learning, and time series forecasting. The data is briefly summarized on Table 2.
3.4. Causal inference
In the causal analysis executed in the study, the treatment variable was clearly stated as prolonged lead time, operated as a binary indicator. For every observation i∊ {1,2, …..,N} the treatment assignment Ti is specified as;
Where;
represents the lead time for observation
while L symbolizes the collection of lead-time values in the dataset. The result variable
signifies stockout risk, characterized as a normalized probability of stockout incidence under operational conditions.
Where; i∈ {1,2,…,N} indexes the observations, Tᵢ ∈ {0,1} represents treatment indicator, and Xᵢ ∈ ℝᵖ is the vector of observed covariates.
Causal inference methods are necessary to go beyond relationships and comprehend the actual impact of lead time on stockout risk. In contrast to conventional ML algorithms that only concentrate on prediction, causal inference approaches assist supply chain managers adopt data-driven policy decisions to lower stockout risk by identifying cause-and-effect linkages [23].
The relationships between numerous supply chain elements and two crucial variables, lead time and stockout risk, are shown in Fig 3. Numerous predictors, including inventory level, demand, supplier dependability, logistics costs, shipment delays, and warehouse capacity, are shown on the left side. These elements affect the likelihood of a stockout and the operational effectiveness of the supply chain. Thus, the pointed arrows indicate these independent parameters possible impact on lead time and stockout risk by illuminating the causal and predictive connections between them and the dependent parameters. Similar to research by [27], this visualization helps the study achieve its objective of better explaining and forecasting stockout difficulties in supply chains through the use of causal inference, ML, and time series forecast.
The causal graph in Fig 3 shows how 17 important supply chain factors, including demand, inventory level, logistics cost, and supplier dependability, directly affect two important outcomes: lead time and stock out risk. A putative causal relationship is represented by each arrow, emphasizing the ways in which operational, logistical, and economic variables influence these targets’ variability. In order to maximize supply chain performance, this framework facilitates intervention planning and predictive modelling.
The causal inference method for the suggested strategy assesses the impact of lead time on stockout risk based on a number of conventional identification assumptions. These assumptions describe the circumstances in which the predicted Average Treatment Effects (ATEs) may be evaluated logically instead of associatively. It is considered that all pertinent confounders influencing both lead time and stockout risk are accounted for conditional exchangeability, that units possess a non-zero likelihood of being assigned either of the treatment and control groups and that every single supply chain incident is distinct with no influence among observations. In the case of instrumental variable assessment, the instrument is presumed to be associated with lead time and to influence stockout risk solely through this mechanism. These hypotheses direct the assessment of causal estimates in the simulated framework.
3.4.1. Propensity Score Matching (PSM).
PSM is a technique that uses a quasi-randomized test to determine the causal relationship between lead time and stockout risk. It compares observations that have been treated and those that have not, according to the propensity score that indicates the probability of receiving the treatment.
where;
e(Xᵢ) shows the probability that observation i is assigned the treatment based on the observed covariates Xᵢ where i ∊ {1,2,…,N} indexes the data observations, Ti ∊ {0,1} signifies the treatment indicator, and Xi ∊ Rp signifies the vector of p observed covariates [28].
3.4.2. Doubly robust estimation.
Doubly robust approaches integrate IPW and regression modifications to produce consistent results regardless of whether one of the models is incorrectly defined [1].
The doubly robust estimator is defined as:
Where;
i ∊ {1, 2,…, N} indexes the observations, N is the total number of observations, Ti ∊ {0,1} is the treatment indicator, Yi denotes the stockout risk, Xi ∊ Rp represents the vector of observed covariates, e(Xi) represents the propensity score, and g₁(Xᵢ) and g₀(Xᵢ) are outcome models for treated and control categories [29].
3.4.3. Instrumental Variables (IV).
Instrumental Variables (IV) analysis is employed whenever the assignment of treatment is intrinsic, which means it is influenced by unobserved variables that impact the final result. Supplier distance is used as an instrumental variable for lead time, believing that it exclusively influences stockout risk by lead time.
Second-stage model:
Where;
i ∊ {1,2,…,N} represents the index observations, Ti denotes the treatment variable, which is the lead time, Ti is the predicted treatment from the first-stage model, Yi, denoting stockout risk, Zi represents supplier distance and Xi is the observed covariates [30].
3.4.4. Inverse Probability Weighting (IPW).
IPW weights observations depending on their propensity scores, resulting in a reweighted pseudo-population that simulates a randomized experiment.
Where;
i ∊ {1,2,…,N} indexes the observations, wi is the weight assigned to observation i, Ti ∊ {0,1} is the treatment indicator, and e(Xi) represents the propensity score, which is defined as the probability of receiving treatment given the observed covariates Xi [31].
3.5. Predictive modeling in supply chains
Predictive modelling is an essential framework in supply chain management that allows businesses to forecast stockouts, demand variations, and logistical problems. Predictive techniques employ ML techniques to analyze previous data, detecting trends and forecasting future events, resulting in enhanced decision-making and operational effectiveness [32].
3.5.1. Random Forest.
Random Forest (RF) is an ensembled ML technique that builds numerous decisions trees and then combines their predictions to increase accuracy and avoid overfitting. It is especially beneficial for supply chain risk modelling since it accommodates non-linear interactions, performs effectively with high-dimensional dataset, and delivers significant feature insights [33].
RF classifier is made up of several decision trees (T1, T2....., TB), all are trained on a randomly chosen subset of attributes and a random portion of the data employing bootstrapping, also known as bagging. By combining each of the tree outputs, majority decision for categorization and average for regression, the ultimate prediction is produced [24].
Step 1: Bootstrapping and Feature Selection [34]
A randomly selected portion of the initial data (Db) is selected with substitution from the entire dataset(D).
At every divided a randomized subset of 𝑚 variables is chosen from the entire 𝑝 features (𝑚 ≪ 𝑝).
Where;
f(x)=predicted stockout risk for x input
B = number of trees.
Tb(x)= predicted from the both tree [35].
3.5.2. Extreme gradient boosting for stockout risk prediction.
Extreme Gradient Boosting is a more advanced variant of gradient boosting which enhances accuracy of predictions, computing efficiency, and handling missing data. It is often employed in supply chain risk modelling because of its capacity to capture complicated interactions between variables while reducing overfitting through regularization.
XGBoost creates an ensemble of additives decision trees, with each tree correcting its predecessor’s flaws by utilizing gradient descent to minimize a loss function [36].
3.5.3. Light gradient boosting machine for stockout risk prediction.
LightGBM (LGBM) is a gradient boosting system that focusses on speed and efficiency in huge datasets. It varies from typical gradient boost approaches like XGBoost in that it employs statistical illustration-based learning and leaf-wise tree development, which makes it computationally efficient and scalable.
Similar to XGBoost, LightGBM iteratively optimizes the loss function by constructing an ensemble of decision trees. Nonetheless, it brings about significant advancements in data processing and tree growth.
Where; [37], L (yi, yi) is the loss function, Ω(fk) is used to prevent overfitting.
3.6. Time series forecasting
Time series forecasting is an important part of supply chain management because it allows firms to forecast prospective stockout risks using past trends. In contrast to traditional predictive modelling, time series approaches expressly consider variations in time in data, allowing organizations to foresee and manage risks. In this work, we adopted the Moving Average (MA) technique to evaluate and anticipate stockout risks [38].
3.6.1. Moving average (MA) model.
The Moving Average (MA) approach is a basic time series method that smoothers variations in stockout risk by combining previous observations across a defined window. This strategy aids in the identification of patterns and seasonality, allowing for more informed inventory management decisions.
Where; q represents the order of the moving average process, ∊ t is the white noise error, μ represents the mean of the series, Yt is the stockout risk [39].
3.7. Model evaluation
Performance metrics or model evaluation are quantitative evaluations of a model’s efficiency, particularly in mathematical modelling, ML, and prediction analysis. In regard to the task for which a model is intended, they assist both scholars and practitioners in evaluating the model’s general efficacy, precision, accuracy, and resilience [33]. In this study, the prediction algorithms’ accuracy and reliability were assessed using conventional performance indicators such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Coefficient of Determination (R2) as shown in Table 3.
4. Results and discussion
4.1. Exploratory and dependency
The datasets were analyzed to find relationships between confounders and lead time as treatment variables to explore the interactions within the dataset.
The scatterplots demonstrate an association between lead time and numerous supply chain variables. The data indicates extremely crowded and consistently distributed, indicating little relationship between lead time and the independent parameters. Additionally, client priority has a discontinuous distribution, which indicates categorical data. The visualizations point to the necessity for additional statistical evaluation, such as coefficients of correlation or a regression simulation, to evaluate potential correlations s shown in Fig 4.
Furthermore, the scatter plots in Fig 5 demonstrate the interactions among numerous supply chain variables and supplier type on lead times. The visualizations indicate modest variations in lead time behavior depending on whether the provider is local or not.
The distribution plots in Fig 6 demonstrate variations in and distribution trends among major supply chain elements. Most numerical characters, including lead time, inventory level, demand, logistics cost, and supplier reliability, have a very equal distribution, signifying a harmonious dataset. However, stockout risk looks heavily skewed, implying that extreme outcomes (high stockout risk) are uncommon. Customer preference is bimodal, meaning two separate types of clients. Managing time and shipment delays exhibit different periodic patterns, which could imply systematic programming or regular influences. These findings inform feature selection and modification for predictive modelling and causal inference.
The correlation matrix represents the interactions among different supply chain parameters, emphasizing the pairwise dependencies. Lead time and stockout risk have a moderate positive connection (~0.48), suggesting that longer lead periods may increase stockout risks. Shipment delay has a smaller connection (~0.17) with stockout risk, indicating a possible but indirect influence. The majority of other features have low correlations, indicating that there are few direct linkages. This insight improves feature selection for predictive modelling and causal inference by focusing on the most pertinent factors influencing stockout risk as shown in Fig 7.
4.2. Experimental results
Table 4 shows how the study analyses stockout risk thoroughly by combining causal inference approaches, ML techniques, and time series forecasting. Amongst the causal inference approaches, PSM and IPW showed substantial ATE, demonstrating a strong causative association between lead time and stockout risk. The IV technique however, had a lower ATE of 0.5535, with a P-value of 0.3148 and a R2f 0.2435. This suggests a weaker causal effect, possibly due to poor instrument selection or unknown confounders.
RF and LGBM generated a R2 of 0.25, but XGBM had a lesser R2 of 0.153 but an increased MSE of 0.835, indicating differences in predicting accuracy. The Moving Average technique excelled other approaches with the greatest R2f 0.883 and the lowest MAE of 0.675, suggesting its strong ability to identify patterns in stockout trends over the years. This multi-method technique, summarized in Table 4, provides a solid foundation for identifying causal relationships and improving predictability in supply chain stockout risk management. Furthermore, to construct a baseline, we employed a typical Linear Regression approach to stockout risk prediction. The regression model recorded a low R2 of 0.00063, meaning it describes almost no variation in the dependent variable, and an extremely high Mean Squared Error (MSE) of 18.89. Our ensemble approach, Random Forest and LightGBM, attained R2 values of 0.25 and lower MSEs of 0.736–0.741, indicating significant increase in predicting ability. This contrast demonstrates that the proposed ensemble paradigm outperforms simple linear methods for modelling complex supply chain behaviors.
In addition, IV causal inference approach, yielded an Average Treatment Effect (ATE) of 0.5535 having a p-value of 0.3148, suggesting the impact assessment wasn’t statistically noteworthy at standard thresholds. The approach’s R2 score of 0.2435 indicates moderate explanation power. In comparison with other simulations, the IV algorithm’s R2 is slightly lesser compared to the RF and LGBM approaches, which both attained R2 scores of 0.25, and significantly lower than the time series approach Moving Average, that attained the highest R2 of 0.883. The high R2 reveals the robustness of the persistence and autocorrelation in the simulated stockout process, instead of the model complexity, reflecting the controlled simulation setup.
An ATE of 0.882 shows that persistent lead time increases the stockout risk index by approximately 0.882 units on a 0–1 scale proportional to shorter lead times, under the assumed conditions.
Fig 8 compares the expected stockout risk to the actual values over an extended period. The dark blue line indicates the projections, whereas the less colored area shows the actual stockout risk levels. The plot shows that the moving average approach covers overall stockout risk patterns, but there are considerable variances, particularly during strong oscillations. The model accurately predicts stockout risk over the years, as evidenced by the high R2 of 0.883 and low MAE of 0.675 based on experimental data.
To prevent data leakage and to maintain temporal validity, the MA technique was evaluated using a rolling origin forecasting procedure. During each predictions step, the MA window integrated only historical observations based exclusively on information available prior to the forecast horizon. No information from future time points was used in model development or evaluation.
This research identifies practical factors for supply chain risk management, which enables it to provide practical consequences at both the operational and policy levels. Thus, due to the fact that the outcomes of the causative investigation establish lead time as an intervening variable, it is recommended that organizations increase their supplier procurement guidelines, logistics organization, and lead-time reserve contracts as architectural risk control process. Therefore, in order to facilitate flexible procurement and performance-driven supplier management, predictive modelling provides the capability to foresee risks in the short term. On the other hand, temporal analysis suggests that safety stock programs should reflect periodic and lasting risk trends rather than depending on fixed reserve levels. As a whole, these realizations lend support to the creation of unified risk monitoring systems and situational planning techniques. They also demonstrate how linked management, forecasting, and monitoring can improve the robustness of supply chain operations.
5. Conclusion
The study offers an innovative, multi-method paradigm that combines causal inference and predictive analytics to tackle the long-standing problem of stockout risk in supply chain management. Unlike previous research that concentrated solely on ML prediction or optimization methods, our approach combines causal inference methods with ML techniques and time series forecasting, providing both predictive and explanatory ability.
In comparison with [19], who employed stochastic programming and ML for outbreak related healthcare supply chains, our study is more generalizable and interpretable by calculating the causal influence of lead time on stockout risk in wider supply chain settings. While they focused on resilience in unpredictability, the application of estimates of causality (ATE) and real-time predictive algorithm provides more supervisory knowledge for day-to-day operational strategy.
Similarly, [16] created a forecasting warning system to predict warehouse cycle time, whereas [4] improved backorder prediction using basic ML models. However, neither investigated the fundamental causes of supply chain interruptions. Our study fills this gap by giving measurable proof of how decreasing lead time causally influences stockout probability, hence improving decision-making across forecasting.
Furthermore, studies by [2,15] emphasized the use of AI/ML to predict inefficiencies and maintenance. Although methodologically correct they are mostly predictive and lack causative tools for diagnosis. In contrast, our hybrid technique predicts risk while also assisting in the identification and quantification of controlled levers, such as lead time, which may mitigate it.
In conclusion, this study advances to the body of existing literature by employing strong econometric techniques to quantify the causal relationship between lead time and stockouts. Hence, u sing ensemble ML algorithms to get high predicting performance. Also, establishing a connection between prediction. Supplying supply chain managers with doable tactics to prevent interruptions in the first place. These contributions show that, in addition to being innovative, our integrated technique is practically better at facilitating data-driven, causally influenced inventories decisions, a feature that is mainly lacking in previous research.
Despite the fact that this study offers a strong multidisciplinary approach that combines time-series forecasting, ML, and causal inference, it has some constraints. Novel approaches like DML or CATE assessment for diverse impacts were not extensively investigated despite the application of important causal inference methods. However, the absence of DML in the study is not detrimental from the current study’s authenticity or uniqueness. We used a variety of well-established methods to identify causality, each with its own set of assumptions and estimating strategies. The convergence of data from PSM and IPW, both of which produced substantial and significant ATE projections, supports the validity of our results. In addition, despite its limitations, our use of IV analysis provides insight into instrument validity and encourages further technical development.
References
- 1. Chakravarthy S, Kishore A, Prabhakaran S, Venkadavarahan M. Assessing urban freight tours: a machine learning and life cycle sustainability assessment approach for logistics management. Bus Strat Env. 2025;34(3):2897–916.
- 2. Sharan B, Vijai C, Srivastava K, Singla A, Bhati I, Anil Kumar TC. Leveraging Machine Learning for Predictive Maintenance in Supply Chain Management Systems. In: Proceedings of 2024 International Conference on Science, Technology, Engineering and Management, ICSTEM. 2024: 1–6.
- 3. Wyrembek M, Baryannis G, Brintrup A. Causal machine learning for supply chain risk prediction and intervention planning. Int J Product Res. 2025a;63(15):5629–48.
- 4. Ali A, Jayaraman R, Azar E, Maalouf M. Maximizing supply chain performance leveraging machine learning to anticipate customer backorders. Comput Ind Eng. 2024;194:110414.
- 5. Vijayapriya R, Arun SL, Vengatesan K, Samee S. Smart manufacturing supply chain process strategy using intelligent computation techniques. Int J Interact Des Manuf. 2024;19(2):681–94.
- 6. Gupta I, Martinez A, Correa S, Wicaksono H. A comparative assessment of causal machine learning and traditional methods for enhancing supply chain resiliency and efficiency in the automotive industry. Supply Chain Analyt. 2025a;10:100116.
- 7. Garg V, Gabaldon J, Niranjan S, Hawkins TG. Impact of strategic performance measures on performance: The role of artificial intelligence and machine learning. Transport Res Part E. 2025;198:104073.
- 8. Anukiruthika T, Jayas DS. AI-driven grain storage solutions: Exploring current technologies, applications, and future trends. J Stored Prod Res. 2025;111:102588.
- 9. Suryawanshi V, Kanyal D, sabale S, Bhoyar V. The role of AI in enhancing hospital operational efficiency and patient care. Multidiscip Rev. 2024;8(5):2025153.
- 10. Huerta-Soto R, Ramirez-Asis E, Tarazona-Jiménez J, Nivin-Vargas L, Norabuena-Figueroa R, Guzman-Avalos M, et al. Predictable inventory management within dairy supply chain operations. IJRDM. 2023;53(3):1–17.
- 11. Shahsavari M, Hussain OK, Sharma P, Saberi M. Modelling supply chain risk events by considering their contributing events: a systematic literature review. Enterprise Inform Syst. 2025;19(5–6).
- 12.
Ali I, Modibbo UM, Bolaji AL, Garg H. Optimization and Computing using I ntelligent Data-Driven Approaches for Decision-Making: Artificial Intelligence Applications. In: Optimization and Computing using I ntelligent Data-Driven Approaches for Decision-Making: Artificial Intelligence Applications. CRC Press; 2024.
- 13. Celig T, Ockenga TA, Schoder D. Distributional equality in Ethereum? On-chain analysis of Ether supply distribution and supply dynamics. Humanit Soc Sci Commun. 2025;12(1).
- 14. Gabellini M, Civolani L, Calabrese F, Bortolini M. A Deep learning approach to predict supply chain delivery delay risk based on macroeconomic indicators: a case study in the automotive sector. Appl Sci. 2024;14(11):4688.
- 15. Hudnurkar M, Renji KM, Ambekar S, Sahu G, Joseph KM. Predicting Delays for Truck Delivery Logistics: An Application of AI and ML. In: 2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM). 2024. p. 1–10.
- 16. Aloini D, Benevento E, Dulmin R, Guerrazzi E, Mininno V. Unlocking Real-Time Decision-Making in Warehouses: A machine learning-based forecasting and alerting system for cycle time prediction. Transport Res Part E. 2025;194:103933.
- 17. Abdulrazaq MA, Fan W. A priority based multi-level heterogeneity modelling framework for vulnerable road users. Transportmetrica A. 2025.
- 18. Bo S, Xiao M. Root cause attribution of delivery risks via causal discovery with reinforcement learning. Algorithms. 2024;17(11):498.
- 19. Yılmaz ÖF, Guan Y, Yılmaz BG, Yeni FB, Özçelik G. A comprehensive methodology combining machine learning and unified robust stochastic programming for medical supply chain viability. Omega. 2025;133:103264.
- 20. Wyrembek M, Baryannis G, Brintrup A. Causal machine learning for supply chain risk prediction and intervention planning. Int J Product Res. 2025b;63(15):5629–48.
- 21. Abdulrazaq MA, Fan WD. Temporal dynamics of pedestrian injury severity: a seasonally constrained random parameters approach. Int J Transport Sci Technol. 2025;20:237–57.
- 22. Liu J, Zhang X, Xiong H. Credit risk prediction based on causal machine learning: Bayesian network learning, default inference, and interpretation. J Forecasting. 2024;43(5):1625–60.
- 23. Feuerriegel S, Frauen D, Melnychuk V, Schweisthal J, Hess K, Curth A, et al. Causal machine learning for predicting treatment outcomes. Nat Med. 2024;30(4):958–68. pmid:38641741
- 24. Uzun Ozsahin D, Duwa BB, Ozsahin I, Uzun B. Quantitative forecasting of malaria parasite using machine learning models: MLR, ANN, ANFIS and random forest. Diagnostics. 2024a;14(4):385.
- 25. Uzun Ozsahin D, Duwa BB, Ozsahin I, Uzun B. Quantitative forecasting of malaria parasite using machine learning models: MLR, ANN, ANFIS and random forest. Diagnostics. 2024b;14(4):385.
- 26. Gruosso G, Spateri E. Generating synthetic data from real-time simulators for deep learning modeling of machining. Processes. 2025;13(12):3953.
- 27. Kurian DS, Maneesh CR, Pillai VM. Supply chain inventory stockout prediction using machine learning classifiers. IJBDA. 2020;1(3):218.
- 28. Li D. Supply chain digitalization, corporate governance, and enterprise carbon emissions. Int Rev Fin Anal. 2025;102:104115.
- 29. Gao Q, Wang J, Fang R, Sun H, Wang T. A doubly robust estimator for continuous treatments in high dimensions. BMC Med Res Methodol. 2025;25(1):35. pmid:39948447
- 30. John ER, Crowther MJ, Didelez V, Sheehan NA. Multiplicative versus additive modelling of causal effects using instrumental variables for survival outcomes - a comparison. Stat Methods Med Res. 2025;34(1):3–25. pmid:39659143
- 31. Chesnaye NC, Stel VS, Tripepi G, Dekker FW, Fu EL, Zoccali C, et al. An introduction to inverse probability of treatment weighting in observational research. Clin Kidney J. 2021;15(1):14–20. pmid:35035932
- 32. Aljohani A. Predictive Analytics and Machine Learning for Real-Time Supply Chain Risk Mitigation and Agility. Sustainability. 2023;15(20):15088.
- 33. Usman AG, Almousa M, Daud H, Duwa BB, Suleiman AA, Ishaq AI, et al. Second-order based ensemble machine learning technique for modelling river water biological oxygen demand (BOD): Insights into improved learning. J Radiation Res Appl Sci. 2025;18(2):101439.
- 34. Yin L, Li B, Li P, Zhang R. Research on stock trend prediction method based on optimized random forest. CAAI Trans Intel Tech. 2023;8(1):274–84.
- 35. Usman AG, Işik S, Abba SI. Qualitative prediction of Thymoquinone in the high‐performance liquid chromatography optimization method development using artificial intelligence models coupled with ensemble machine learning. Separat Sci Plus. 2022;5(10):579–87.
- 36. Han Y, Kim J, Enke D. A machine learning trading system for the stock market based on N-period Min-Max labeling using XGBoost. Expert Syst Appl. 2023;211:118581.
- 37. Zheng X, Cai J, Zhang G. Stock Trend Prediction Based on ARIMA-LightGBM Hybrid Model. In: 2022 3rd Information Communication Technologies Conference (ICTC). 2022: 227–31.
- 38. Gupta S, Nachappa S, Paramanandham N. Stock market time series forecasting using comparative machine learning algorithms. Procedia Comput Sci. 2025b;252:893–904.
- 39. Wang L, Chen L, Jin S, Li C. Forecasting the green behaviour level of Chinese enterprises: A conjoined application of the autoregressive integrated moving average (ARIMA) model and multi-scenario simulation. Technol Soc. 2025;81:102825.
- 40. van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84(2):523–38. pmid:20585380