Demand forecasting for platelet usage: From univariate time series to multivariable models

Platelet products are both expensive and have very short shelf lives. As usage rates for platelets are highly variable, the effective management of platelet demand and supply is very important yet challenging. The primary goal of this paper is to present an efficient forecasting model for platelet demand at Canadian Blood Services (CBS). To accomplish this goal, five different demand forecasting methods, ARIMA (Auto Regressive Integrated Moving Average), Prophet, lasso regression (least absolute shrinkage and selection operator), random forest, and LSTM (Long Short-Term Memory) networks are utilized and evaluated via a rolling window method. We use a large clinical dataset for a centralized blood distribution centre for four hospitals in Hamilton, Ontario, spanning from 2010 to 2018 and consisting of daily platelet transfusions along with information such as the product specifications, the recipients’ characteristics, and the recipients’ laboratory test results. This study is the first to utilize different methods from statistical time series models to data-driven regression and machine learning techniques for platelet transfusion using clinical predictors and with different amounts of data. We find that the multivariable approaches have the highest accuracy in general, however, if sufficient data are available, a simpler time series approach appears to be sufficient. We also comment on the approach to choose predictors for the multivariable models.


Introduction
Platelet products are a vital component of patient treatment for bleeding problems, cancer, AIDS, hepatitis, kidney or liver diseases, traumatology and in surgeries such as cardiovascular Figure 1.CBS blood supply chain with one regional blood centre and multiple hospitals determine the influence of demand history as well as clinical predictors on demand forecasting.The first two models are univariate time series that only consider the demand history, while the remaining three methods, multivariate regression, random forest, and artificial neural networks, consider clinical predictors.These five methods are utilized to pursue the following goals: i) more precise platelet demand forecasting for the benefit of both CBS and hospitals, ii) reducing the bullwhip effect, as a consequence of effective demand forecasting; and iii) investigating the impact of clinical predictors on the platelet demand.The main contributions of this study are as follows: model performance and provide a holistic evaluation and comparison for different forecasting methods to evaluate the effectiveness of these models for different data types, providing suggestions on using these robust demand forecasting strategies in different circumstances.Results show that when having a limited amount of data (two years in our case), multivariate models outperform the univariate models, whereas having a large amount of data (eight years in our case) results in the ARIMA model performing nearly as well as the multivariate methods.
The rest of this paper is organized as follows.In Section 2 we provide a literature review of demand forecasting methods for blood products, with a focus on platelets.Section 3 provides the data description and an overview of the five models used for forecasting platelet demand.The main results of the study are presented in Section 4. In Section 5, a comparison of the models is provided, and finally, in Section 6, concluding remarks are provided, including a discussion of ongoing work for this problem.

Literature Review
There is a limited literature on platelet demand forecasting; most investigates univariate time series methods.In these studies, forecasts are based solely on previous demand values, without considering other features that may affect the demand.Critchfield et al. (1985) develop models for forecasting platelet usage in a blood centre using several time series methods including Moving Average (MA), Winter's method and Exponential Smoothing (ES).Silva Filho et al. (2012) develop a Box-Jenkins Seasonal Autoregressive Integrated Moving Average (BJ-SARIMA) model to forecast weekly demand for blood components, including platelets, in hospitals.They later extend this work in (Silva Filho et al., 2013).Kumari and Wijayanayake (2016) propose a blood inventory management model for the daily supply of platelets focusing on reducing platelet shortages by applying three time series methods, MA, Weighted Moving Average (WMA) and ES.Volken et al. (2018) use generalized additive regression and timeseries models with ES to predict future whole blood donation, including platelets, and RBC transfusion trends.Fanoodi et al. (2019) use artificial neural networks and ARIMA models to forecast platelet demand by considering daily demands for eight types of blood platelets.They consider different demand data lags, 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 1 week, 15 days, 30 days, 90 days, 120 days, and 365 days, as the input data for the artificial neural networks.
On the other hand, many studies focus on univariate whole blood demand forecasting rather than a specific blood product, using time series or machine learning models.Frankfurter et al. (1974) develop transfusion forecasting models using ES methods for a blood collection and distribution centre.Fortsch and Khapalova (2016) apply various approaches to predict blood demand such as Naïve, MA, ES, and multiplicative Time Series Decomposition (TSD), amongst which a Box-Jenkins (ARMA) approach results in the highest prediction accuracy.Lestari et al. (2017) apply four time series models, MA, WMA, ES and ES with trend, to forecast demand for blood components.Twumasi and Twumasi (2022) apply K-Nearest Neighbour regression (KNN), Generalised Regression Neural Network (GRNN), Neural Network Auto-regressive (NNAR), Multi-Layer Perceptron (MLP), Extreme Learning Machine (ELM), and an LSTM network for forecasting and backcasting blood demand to predict future and lost past demand data respectively, by using a rolling-origin procedure.
Several recent studies include additional features other than demand history for demand forecasting.Drackley et al. (2012) estimate long-term blood demand for Ontario, Canada based on previous transfusions' age and sex-specific patterns.They forecast blood supply and demand for Ontario by considering demand and supply patterns, and demographic forecasts, with the assumption of fixed patterns and rates over time.Khaldi et al. (2017) apply Artificial Neural Networks (ANNs) to forecast the monthly demand for three blood components, red blood cells (RBCs), platelets and plasma for a case study in Morocco.Guan et al. (2017) propose an optimization ordering strategy in which they forecast the platelet demand for several days into the future and build an optimal ordering policy based on the predicted demand, concentrating on minimizing the wastage.Their main focus is on an optimal ordering policy and they integrate their demand model in the inventory management problem, meaning that they do not try to precisely forecast the platelet demand.Li et al. (2021) develop a hybrid model consisting of seasonal and trend decomposition using Loess (STL) time series and eXtreme Gradient Boosting (XGBoost) for RBC demand forecasting and incorporate it in an inventory management problem.
In this study, we utilize multiple demand forecasting methods, including univariate analysis (time series methods) and multivariate analysis (regression and machine learning methods), and evaluate the performance of these models for platelet demand forecasting.We explore the value gained from including a range of clinical predictors for platelet demand forecast models.More specifically, we consider clinical predictors, consisting of laboratory test results, patient characteristics and hospital census data as well as operational related indicators, including the previous week's platelet usage and previous day's received units with the aim of accurate demand forecasting.In addition to the linear effects of the clinical predictors, we study the nonlinear effect of these clinical predictors in our choice of machine learning models.Moreover, we explore the effect of having different amounts of data on the accuracy of the forecasting methods.To the best of our knowledge, this study is the first that utilizes and evaluates different demand forecasting methodologies from univariate time series to multivariate models for platelet products and explores the effect of the amount of available data on these approaches.

Methods
In this section, we present the general problem setting, a comprehensive data description and the methods used for data exploration.We also provide an overview of the five models used for forecasting platelet demand, an overview of the rolling window analysis used for retraining the models, and the error measures used for evaluation.

Problem Setting
In this study, we consider a blood supply system consisting of one regional CBS distribution centre and four major hospitals operating in the city of Hamilton, Ontario.As a result of internal inventory management practices, these four hospitals are considered as one entity.At the beginning of the day, hospitals receive platelet products that were ordered on the previous day, from CBS.In the case of shortages, hospitals can place expedited (same-day) orders at a higher cost.Prior to September 2017, platelets had five days of shelf life, while after this date, the shelf life of platelets was increased to seven days.After exceeding the shelf life, platelet products are expired and discarded.

Data Description
The data in this study are constructed by processing CBS shipping data and the TRUST (Transfusion Research for Utilization, Surveillance and Tracking) database at the McMaster Centre for Transfusion Research (MCTR) for platelet transfusion in Hamilton hospitals.The study is approved by the Canadian Blood Services Research Ethics Board and the Hamilton Integrated Research Ethics Board (HiREB number 7293).The data are high dimensional, with more than 100 variables that can be divided into four main groups: 1. the blood inventory data such as product name and type, received date, expiry date, 2. patient characteristics such as age, gender, patient ABO Rh blood type, 3. the transfusion location such as intensive care, cardiovascular surgery, hematology, and 4. available laboratory test results for each patient such as platelet count, hemoglobin level, creatinine level, and red cell distribution width.The laboratory tests are prescribed by physicians based on clinical needs and can help to decide whether a patient needs platelet transfusion.In this research, the laboratory test results are processed and used along with other information to forecast future platelet demand.
Additionally, we add new calculated predictors such as the number of platelet transfusions in the previous day and previous week, the number of received units in the previous day, and day of the week.Table 1 gives the set of predictors that are used in this study along with their descriptions.These predictors are selected by a lasso regression model (Tibshirani, 1996) which is explained in detail in Section 3.4.2.As we can see from

Exploratory Analysis for Trends, Seasonality and Holiday Patterns
In order to propose a short-term demand forecasting model, we first explore the data for identifying temporal (daily/monthly) patterns that can inform our demand forecasting techniques.In particular, we investigate correlations among the predictors, seasonality, day of the week, and non-stationarity effects.textbfIdentify non-stationarity: The Augmented Dickey-Fuller (ADF) test (Cheung and Lai, 1995) is applied on the time series data to examine the stationarity.
Identify seasonality: We apply the Seasonal and Trend decomposition using Loess (STL) model to decompose the time series data into trend, seasonality, and residuals.We also apply the one-way ANOVA test to compare the means of the transfused units in different months, and the means of the transfused units during weekdays and weekends.Moreover, we explore the trend, holidays, weekly seasonality, and yearly seasonality using the Prophet model, explained in Section 3.4.1.Identify day of the week effect: We also compare the mean daily units transfused based on day of the week by plotting the mean against day of the week, and also by applying the t-test to compare the mean daily units transfused during weekdays and weekends.

Demand Forecasting Models
This section explains the five forecasting models used for forecasting the platelet demand in Hamilton hospitals.The ARIMA and Prophet models are univariate models that forecast the demand based on demand history.Lasso regression, random forest, and LSTM networks are multivariate models that consider various predictors in addition to demand history for forecasting the demand.

Univariate Models
The univariate models, ARIMA and Prophet, forecast the demand solely based on the previous demand values.The ARIMA model does not consider seasonality in data and is considered as a baseline model.The Prophet model, on the other hand, considers trend, seasonality, and holidays for forecasting the demand.

ARIMA Model
An autoregressive integrated moving average model consists of three components, an autoregressive (AR) component that considers a linear combination of lagged values as the predictors, a moving average (MA) component of past forecast errors (white noise), and an integrated component where differencing is applied on the data to make it stationary.Let y 1 , y 2 , . . ., y t be the demand values over time period t; the time series data can be written as: An ARIMA model assumes that the value of demand is a linear function of a number of previous past demand values and previous error values.Thus, the ARIMA model can be written as: where ŷt is the response variable (the predicted demand), µ is a constant, ϑ i and φ j are model parameters in which i = 1, 2, . . ., p and j = 0, 1, 2, . . ., q, p and q are the model orders and define the number of autoregressive terms and moving average terms, respectively.In order to fit an ARIMA model, first the ADF test is applied on the time series data to examine the stationarity, and the standard auto arima() function in Python is used for hyperpa-rameter tuning and determining the optimal model order.A function is developed in Python to implement the ARIMA model via a rolling-origin strategy.

Prophet Model
Prophet is a time series model introduced by Taylor and Letham (2018) that considers common features of business time series: trends, seasonality, holiday effects and outliers.The Prophet model was developed for forecasting events created on Facebook and is implemented as an open source software package in both Python and R. Let g t be the time series trend function which shows the long-term pattern of data, s t be the seasonality which captures the periodic fluctuations in data such as weekly, monthly or yearly patterns, and finally h t be the non-periodic holiday effect.These features are combined through a generalized additive model (GAM) (Hastie and Tibshirani, 1987), and the Prophet time series model can be written as: The normally distributed error ε t is added to model the residuals.We use the Prophet library in Python for implementing the Prophet model and develop a function for implementation via a rolling-origin strategy.

Multivariate Models
In order to explore the effect of including clinical predictors in the forecasting process, in the next step we introduce three multivariate models that incorporate clinical predictors: lasso regression, random forest, and LSTM networks.These machine learning models are implemented to forecast the demand based on demand history and multiple predictors.Lasso regression is used as a forecasting model and a variable selection method to select the most relevant predictors for the multivariate models.

Lasso Regression
We use lasso regression (Tibshirani, 1996) since it allows predictors to be included in the demand forecasting process.The lasso regression model performs variable selection to reduce the complexity of the model, as well as improving the prediction accuracy.By considering the actual demand on day t (t = 1, 2, ..., N) as y t and the predicted demand on day t as the product of the clinical predictors (z t j ) and their corresponding coefficients β j , where j = 1, 2, ..., M specifies the clinical predictor, the lasso regression is the solution to the following optimization problem: The optimization problem defined in ( 4)-( 5) chooses the coefficients, β , that minimize the sum of squares of the errors between the actual values (y) and the response variable values, with a sparsity penalty (λ ) on the sum of the absolute values of the model coefficients.Constraint (5) forces some of the coefficients (that have a minor contribution to the estimate) to be zero.predictors that have non-zero coefficients are selected in the model, and the response variable is calculated as follows: In this study, lasso regression is used as a variable selection method to find important predictors for platelet demand.Subsequently, this information is used for demand forecasting.We use the LassoCV function from the sklearn package in Python to implement the lasso regression.The penalty coefficient λ is chosen through five-fold cross-validation.A function is developed to implement the lasso regression via a rolling-origin strategy.

Random Forest
Random forests, first proposed by Ho (1995), are ensemble methods that use decision trees.We chose to explore random forests as they can capture nonlinear relationships between predictors while also being interpretable, as what a decision tree does can be understood by simply looking at it.Decisions trees in a forest are trained using bootstrapped samples and are only allowed to consider a subset of the predictors when choosing splits.Considering the actual demand on day t as y t , and the set of days in the bootstrap samples as D, a tree starts with a root node that has an attached value µ: This node creates two child nodes that separate data points based on a clinical predictor, u, where one node gets data with the value of u on day t (z tu ) less than a value v and the other node gets data with z tu greater than or equal to v.These child nodes have attached values calculated in the same way as the root, µ 1 = 1 |{t|z tu <v}| ∑ t:z tu <v y t and µ 2 = 1 |{t|z tu ≥v}| ∑ t:z tu ≥v y t .The split measures, u and v, are chosen by minimizing the variance of the model.A random forest grows a number of these trees, K, and produces a prediction for a set of clinical predictors, z t , by averaging together the predictions of each of the trees: where each tree T i takes a set of clinical predictors and traverses the nodes of tree i using the splits found with the above equations.Forecasting problems can have linear or nonlinear relationships among the model predictors.Random forests can work on both linear and nonlinear data, and are able to capture nonlinear dependencies among the predictors.We use the Ran-domForestRegressor function from the scikit-learn package in Python to implement the random forest.Hyperparameter tuning is achieved by using grid search on the number of trees, maximum tree depth, and the number of features to consider when looking for the best split.The best split in a tree is chosen by minimizing MSE (Mean Square Error) and five-fold crossvalidation is used to reduce overfitting.We developed a function in Python to implement the random forest model via a rolling-origin strategy.

LSTM Network
LSTM networks are a class of recurrent neural networks (RNN) that were introduced by (Hochreiter and Schmidhuber, 1997) and are capable of learning long-term dependencies in sequential data.In theory, RNNs should be capable of learning long-term dependencies, however they suffer from the so-called vanishing gradient problem.Consequently, LSTM networks are designed to resolve this issue.An LSTM network maps a set of input neurons (also called units) to a set of output neurons through a hidden layer.A neuron or unit in an LSTM network consists of an input gate (i t ), a forget gate ( f t ), a cell state (c t ), and an output gate (o t ).
The hidden layer output can be written as a function of the gates, the model input (here the clinical predictors (z t )), and the previous output of the hidden layer: The output of the LSTM network, here the demand forecasts, is calculated as a weighted value of the hidden layer output plus a bias, b: Like random forests, LSTM networks are able to capture nonlinear dependencies among the predictors.We implement the LSTM network using the TensorFlow package (Abadi et al., 2016).The LSTM network is trained by using the ADAM optimizer (Kingma and Ba, 2014), and MSE is used as the loss function for this optimizer.For hyperparameter tuning, grid search is performed to find the best model parameters (including the number of epochs, batch size, and number of hidden layers) toward the minimum MSE.Moreover, 10-fold cross-validation is used to reduce overfitting.A function is developed in Python to implement the LSTM network model via a rolling-origin strategy.

Rolling Window Analysis
We fit the forecasting models multiple times in order to collect multiple out-of-sample onestep ahead forecast errors by using a rolling window.The rolling window is used as part of the demand forecasting process to periodically retrain the models and use more recent data.The flowchart of the proposed demand forecasting system is given in Figure 2. We retrain each model periodically, according to two parameters, the training window and the retraining period.When we retrain a model, we use a training window of the most recent data.For evaluation, we consider a rolling-origin evaluation, similar to the one presented in (Tashman, 2000).Many studies consider a fixed-origin evaluation, but we consider a rolling-origin eval-uation to improve the efficiency and reliability of out-of-sample tests (Tashman, 2000).In a rolling-origin evaluation, the forecasting origin is successively updated and new forecasts are produced from each new origin.We set the forecasting window and rolling steps to be the same as the retraining period.

Results
This section presents the results of the exploratory analysis for trends, seasonality and holiday patterns.We also present demand forecasting comparisons for univariate and multivariate models, and the forecasting performance of the models trained with training window sizes of two and eight years and retraining periods of 1, 7, 30, and 90 days.We implement the models to forecast the daily demand aggregated over four hospitals for one day ahead via a rolling-origin strategy.We periodically retrain our models based on the rolling window analysis explained in Section 3.5.

Trends, Seasonality and Holiday Patterns
The data analysis ranges from 2010/01/01 to 2018/12/31.An initial observation is that the demand is highly variable, with a transfused daily average of 17.90 units and a standard deviation of 7.05 units.
Observations for non-stationarity: The results of the ADF test show that the data is not stationary (P value = 0.085) before 2016, but it becomes stationary from 2016 onwards (P value <0.001).
Observations for seasonality: Figure 3 shows the time series data decomposition using the STL model.As we can see in the seasonal part, there are recurring temporal patterns in the data.The results of the one-way ANOVA test also show that there is a significant difference between the means of the daily transfusions during weekdays and weekends (F = 5.13, P value <0.001) and the means of daily transfusions in different months (F = 3.94, P value <0.001), which provide strong evidence in favour of the presence of weekly and monthly seasonalities.Since the data becomes stationary from 2016 onwards, we also explore the trend, holidays, weekly seasonality, and yearly seasonality (seasonality within a year) starting from 2016 using the Prophet model.As we can see from Figure 4, there is a downward trend from the beginning of 2016 to July 2017 and an upward trend from July 2017 to the end of 2018.Almost all holidays have a negative effect on the model, except for July 1st.This means that the demand is lower than regular weekdays for almost all of the holidays, except for July 1st.We can also see that there is weekly seasonality in which Wednesdays have the highest platelet usage while the weekends have the lowest usage.Moreover, the yearly seasonality, captured by Fourier series in the Prophet model, depicts three cycles: 1. January to May in which March has the highest demand while May has the lowest demand; 2. May to September in which the demand is highly variable.July has the highest demand in this cycle and the highest demand of all months while May has the lowest demand in the cycle and also the lowest demand of all months; 3. September to January with a slight variation in demand -November with the highest and January with the lowest demands.
Observations for day of the week effect: Next, we examine the univariate models' residuals via the ACF (Autocorrelation Function).Figure 7 gives the coefficients of correlation between a value and its lag for ARIMA and Prophet.As we can see in Figure 7(a), there is an autocorrelation at time seven (and multiples of seven) due to weekly seasonality that is not incorporated in the model.Since seasonality is one of the primary features of our time series data, we include seasonality directly in the forecasting process by using the Prophet model.As we can see in Figure 7(b), there is no repeated autocorrelation pattern for Prophet residuals.
We also perform a pairwise t-test to compare the the univariate models' residuals with each other.The results show a statistically significant difference between the ARIMA residuals (mean [sd]: 0.39 [6.80]) and Prophet residuals (mean [sd]: -0.07 [5.90], t-test: 95% confidence interval for the difference in means: (0.08, 0.85), P value = 0.018), indicating higher residuals in the ARIMA model.

Demand Forecasting Comparisons for Multivariate Models
We begin this section with an examination of selecting the clinical predictors for the multivariate models.Next, we compare the forecasts generated by the multivariate models and the actual demand.

Selecting the predictors using Lasso Regression
As discussed in Section 3.2, the data has more than 100 features, and we select predictors via lasso regression.The 29 clinical predictors that are introduced in Section 3.2 are selected by lasso regression and used for training the multivariate models.One of the data characteristics is that clinical predictors are highly correlated.These high correlations can affect the performance of a regression model, mainly because of the violation of model assumptions.
We calculate the Pearson correlation between the selected predictors.The Pearson correlation measures the linear relationship between two variables, ranging from -1 to 1, where -1 corresponds to a perfect negative correlation and 1 corresponds to a perfect positive correlation.
In terms of the retraining periods, retraining the models less frequently reduces the variability of the error.If we compare Figure 11(a) with Figure 11(g), we see that the error is less variable in Figure 11(g) for all the models, similarly for MAE in Figure 12.This can also be verified from the results in Table 3, where we see lower standard deviations as we move down to retraining every 90 days.3, increasing the training window size does not necessarily decrease the errors.There is similar behavior for the retraining periods, especially for the multivariate models, but we see that retraining less frequently results in less variable errors.Overall, the results indicate that while univariate models can benefit from a larger training window size and frequent retraining, the performance of the multivariate models is not affected by a larger training window, meaning that these models have robust performance with different data volumes.

Comparison and Discussion
In this section, we compare the models and provide recommendations for using these models in various scenarios.In Section 5.1 we compare the models based on a training window of two years, in Section 5.2 we discuss the impact of an increased amount of data on the forecasting models, and in Section 5.3 we discuss the effect of different retraining periods on the models.Finally, in Section 5.4 we provide the overall methodological implications of the study and in Section 5.5 discuss managerial implications.

Univariate versus Multivariate Models
We have presented five different models for platelet demand forecasting that can be divided into two groups: univariate and multivariate.Univariate models, ARIMA and Prophet, forecast future demand based only on the demand history.Although the ARIMA model only considers a limited number of previous values for forecasting the demand, retraining it every day, week or month leads to a slight performance improvement.The Prophet model incorporates the historical data, seasonality and holiday effects into the demand forecasting model which results in an improvement in the forecasting accuracy by approximately 10% compared to ARIMA.This highlights the impact of weekday/weekend and holiday effects in the platelet demand variation.As we discussed in Section 4.1, there is a weekday/weekend effect for platelet demand, which is not (directly) captured in the ARIMA model.Multivariate models, on the other hand, incorporate clinical predictors as well as historical demand data for demand forecasting.We use lasso regression to select the dominant clinical predictors that affect the demand.Lasso regression examines the linear relationship among the clinical predictors and their influence on the demand.However, as presented in Figure 8, there are correlations among the clinical predictors.There may also be nonlinear relationships among these clinical predictors that cannot be captured by a linear regression model.These issues motivated us to use two machine learning approaches, random forest and LSTM network.Random forest is capable of capturing nonlinearities among variables and its forecasting method of averaging past values provides some contrast to LSTM network's modelling approach.An LSTM network can also account for nonlinearities among variables.Moreover, an LSTM network is capable of retaining past information while forgetting some parts of the historical data.As we can see from Table 3, in general, random forest, LSTM network and lasso regression have low forecast errors for different training window sizes, owing to the inclusion of the clinical predictors.

Two Years versus Eight Years of Data
As discussed in Section 3.4, we train our models for two training window sizes, with two years and eight years of data, respectively.Since there is no trend in the data from 2016 onwards (see Figure 3), in the first scenario the models are trained for two years (training window size of two years, starting from 2016).With this amount of data and by retraining every year, forecasts are not accurate for univariate time series approaches, and one needs to include the clinical predictors in the forecasting model.However, by considering a training window of eight years, the ARIMA model's performance improves by approximately 20%, compared to the case of a two year training window.The Prophet model's performance also improves when more data are available, specifically when it is trained less frequently (30 and 90 days retraining windows).
In general the multivariate models result in small forecasting errors for two years of data for training, and do not perform significantly better as the amount of data increases, which shows that there is not much sensitivity to the training window size.This highlights the importance of including the clinical predictors in the forecasting process.

Different Retraining Periods
We also compare different retraining periods and provide insight on how to choose the appropriate retraining period for this data (and in general).Our results show that considering different retraining periods does not affect the models in the same manner.While in general all the models benefit from retraining more frequently, univariate models benefit more.For the univariate models, the greatest performance increase is for the ARIMA model when retrained every day, resulting in a decrease of 50% in MAPE and SMAPE.For the multivariate models, lasso regression has an impressive performance increase when retrained every day, while random forest and LSTM networks show less sensitivity to the retraining period.So, by considering the overhead of retraining these models more frequently, one may decide to choose to use a larger retraining window for random forest and LSTM networks.
Generally, if the retraining period is small, meaning that the models are retrained more frequently, the mean forecast accuracy representing the long-term overall performance is improved.

Methodological implications of the study
In general, when there is access only to previous demand values, using a univariate model and retraining it frequently is effective.In the case that several data variables are available, lasso regression, random forest models and LSTM networks can forecast the demand with higher accuracy even when a small amount of data is available and without frequent retraining.Forecasting problems can have linear or nonlinear relationships among the model variables.Due to the fact that LSTM networks can work on both linear and nonlinear time series, and are able to capture nonlinear dependencies, they can outperform linear regression models when long term correlations exist in the time series.Based on the LSTM results, we conclude that long term correlations and nonlinearity are not major issues for our data since the LSTM model does not significantly outperform lasso regression.
While LSTM networks perform well even with a limited amount of data and they can capture nonlinear relationships, they lack interpretability.Interpretability is an important feature of any prediction model used in a safety critical setting like blood product distribution.Considering the time and memory complexity, and interpretability of these models, lasso regression has lower time and memory complexity while it is also very interpretable.Random forest models maintain interpretability while also having the ability to capture nonlinear relationships.Random forests do well when their training data has good coverage of the different feature combinations the model is forecasting.This is because random forest models make forecasts for a set of features by averaging together similar data points from the training data.This allows random forest models to extract nonlinear relationships but also means they cannot extract trends effectively and may need a large amount of data in order to work well.This can be seen in our model (see Table 3), a training window of eight years, with more training data points to reference, has a small improvement in the error measures over a training window of two years for different retraining periods.
Training random forest models and LSTM networks requires expertise in the machine learning area since poor training will cause low-precision results.It is also worth mentioning that the LSTM network is a robust learning model and is capable of learning linear and nonlinear relationships among the model variables even in very short time series data (Boulmaiz et al., 2020;Lipton et al., 2015).However, as the number of inputs increases, both the data variables that make data wide and the data rows that make data tall, LSTM performance tends to decrease because it is highly dependent on the input size.Moreover, wide data results in model overfitting (Lai et al., 2018).Having wide data, one can apply a feature selection method such as lasso regression to reduce the number of variables and regularize the input.
One limitation of forecasting models is that they cannot capture sharp peaks in demand.Figure 15 depicts the actual and predicted demands for the second half of 2018 with a training window of two years and a retraining period of 7 days (retraining weekly) using lasso regression.It appears that the model does some degree of smoothing and thus cannot detect the sharp peaks.One possible explanation is that regression models are regressed on the expectation of the outcome, and are not good at capturing the extreme deviations from this expectation.However, as shown in Figure 15, smoothing mostly occurs for the maxima rather than the minima.In other words, the model potentially has large errors when there is excess demand, for example in emergency situations.The results presented in Sections 4.2 and 4.3 support that all the models struggle with capturing the peaks in demand.To sum up, when a sufficient amount of data is available, using a univariate model results in a low forecast error, particularly in the case that it is retrained every day.Specifically, when there is only access to the previous demand (as is currently the case for CBS) and adequate historical data are available, one can benefit from a simple univariate model like ARIMA or Prophet, since univariate models are simpler than the multivariate models.Multivariate models are useful when there is access to a limited amount of data.Also, they do not necessarily require frequent retraining, which may be an important implementation concern.

Managerial implications of the study
The short shelf life of platelets results in wastages which not only incur large costs but also affect the environment since they cannot be reused, recycled, or recovered (Jemai et al., 2020).Moreover, since platelet demand is highly variable, urgent same-day deliveries are placed frequently.Apart from the high cost of urgent orders, platelet shortage can increase the risk of putting patients' lives in danger.Currently, blood suppliers are not aware of the demand at the hospitals since hospitals hold excess inventory to manage the highly variable platelet demand.Indeed, this has its roots in the bullwhip effect, that is hospitals tend to order more than their actual demand.We see an opportunity to better coordinate supply (number of units received) with demand (number of units transfused) through the development of a daily demand predictor.Forecasting the demand improves the transparency between blood suppliers and hospitals, and helps blood suppliers to make better-informed decisions.
From the clinical perspective, accurate demand forecasting is important for clinical and supply chain management purposes.Demand forecasting can be used for placing optimal platelet orders and for decision making in many parts of the supply chain such as donation planning, and resource and staff management.As we can see in Section 4, there is some fundamental limit to how accurate the demand forecasts can be, so one important challenge would be how to use the demand forecasts to inform an ordering policy in an effective manner.Clearly, fore-casts themselves do not reflect an optimal ordering decision but they can be used as additional information in building effective ordering/inventory management policies (incorporating such forecasts is one of our current research directions).
Moreover, this research provides a holistic analysis of the predictors that affect the platelet demand, including the clinical predictors, hospital locations, day of the week and demand history.This can help blood suppliers with adapting clinically relevant factors into the decision making process, like decisions regarding the assignment of transfusion related staff/resources (beds or equipment).
Overall, there is a significant caveat with all of these approaches in that there are still forecasting errors, in particular they all struggle with capturing peaks.These underestimations may cause significant concerns for using such forecasts directly as there is the danger of severe underestimation.Therefore, some adjustments may be required for using these forecasts according to specific objectives.For instance, one may need an optimization model for inventory control if the demand forecasts are used for inventory management.

Conclusion
In this study, we utilized two types of methods for platelet demand forecasting, univariate and multivariate methods.Univariate methods, ARIMA and Prophet, forecast platelet demand only by considering the historical demand information, while multivariate methods, lasso regression, random forest and LSTM networks, also consider clinical predictors.The error levels for the univariate models, particularly in the case that a small amount of data is available, motivates us to utilize clinical predictors to investigate their ability to improve the accuracy of forecasts.Results show that lasso regression, random forest and LSTM networks outperform the univariate methods when a limited amount of data are available.Moreover, since they include clinical predictors in the forecasting process, their results can aid in building a robust decision making and blood utilization system.However, their application is not limited to platelet products.We believe that they can be used in various areas, including healthcare in general, finance and climate studies, when data features are available.On the other hand, when there is access to a sufficient amount of data, the marginal improvement for a simple univariate model such as ARIMA is higher than for multivariate models.In such scenarios, univariate models can be applied to historical data for demand forecasting, regardless of the product, which makes these models generalizable and widely applicable.
Future extensions of this work will include: (i) proposing an optimal ordering policy based on the predicted demand over a planning horizon with ordering cost, wastage cost and shortage (same-day order) cost; (ii) further exploring the lasso regression approach to enhance variable selection, with a particular focus on interpretability (this will not only affect the lasso regression itself, but also may improve LSTM forecasting accuracy since LSTM inputs are selected using lasso regression); (iii) more extensive empirical evaluation of the proposed models; (iv) exploring the generality of the results (outside of Hamilton).

Figure 2 .
Figure 2. Flowchart of the proposed system

Figure 3 .
Figure 3.Time series decomposition using STL method

Figure 4 .
Figure 4. Prophet model for exploring trends, holiday effects, weekly and yearly seasonality -Since these components are combined through a generalized additive model, the values of y-axes in the plots represent the quantity to be added to or substracted on each specific day

Figure 11 .Figure 12 .
Figure 11.RMSE with different training window sizes and retraining periods

Figure 13 .Figure 14 .
Figure 13.MAPE with different training window sizes and retraining periods

Figure 15 .
Figure 15.Demand forecasting with lasso regression with a training window of two years and a retraining period of seven days

Table 1 ,
predictors have different ranges, and hence are standardized by z-score normalization.All data processing and analysis and model implementations are carried out using the Python 3.7 programming language.

Table 1 .
Data variable definition and description

Table A1 of
Appendix A.
Table A1 of Appendix A.Figure10shows the actual daily units transfused and the forecasts generated by the multivariate models, lasso regression, random forest and LSTM network, with a training window of two years and by retraining every day.The forecast means of lasso (mean [sd]: 19.12 [3.62]) and random forest (mean [sd]: 19.72 are very to the actual mean demand, but forecast standard deviations much lower than the actual demand standard variation.LSTM network forecasts have a slightly lower mean (mean [sd]: 18.01 [3.55]) but significantly lower standard deviation than the actual demand.

Table 2 .
Comparison of multivariate models residuals using a pairwise t-test The performance of the forecasting models is computed based on the rolling-origin evaluation and by four error measures, RMSE, MAE, MAPE, SMAPE.The first two error measures, RMSE and MAE, are absolute measures while the remaining ones, MAPE and SMAPE, are relative measures.The errors are measured for each rolling origin for the test data and reported in Figures 11-14 and Table 3. Table 3 gives the mean and standard deviation of the errors for different training window sizes and retraining periods.Figures 11 and 12 compare the RMSE and MAE of the models trained with different training window sizes and retraining periods.As we can see in these figures and in Table

Table 3 .
Model performance with different training window sizes and retraining periods