Forecasting emergency department visits in the reference hospital of the Balearic Islands: The role of tourist and weather data

Paride Crisafulli; Angel del Río Mangada; Juan José Segura Sampedro; Claudio R. Mirasso; Raúl Toral; Tobias Galla

doi:10.1371/journal.pone.0343713

Abstract

Accurate forecasting of patient arrivals at emergency departments (EDs) is vital for efficient resource allocation and high-quality patient care. In this study we investigate the relevance of exogenous variables, namely tourism, weather, calendar and demographic variables, in forecasting ED visits in the reference hospital in Palma de Mallorca, a city with significant seasonal population fluctuations due to tourism. Using a machine learning approach, we develop a model that predicts ED visits based solely on these exogenous variables. We test different machine learning algorithms (random forests, support vector machines, and feedforward neural networks) with different combinations of input variables and compare their symmetric mean average percentage errors (SMAPEs). Our findings reveal that calendar information, resident, and tourist population data are statistically significant for the accuracy of the predictions, while the addition of weather data does not provide any further improvement. Comparison of non-time-series with time-series prediction models reveals that the latter provide better accuracy for short prediction horizons (e.g., shorter than a week). Furthermore, time-series models become less or equally accurate to models relying only on exogenous variables for long prediction horizons (e.g., fortnight or month). Our study highlights the importance of carefully selecting predictive variables to ensure short- and long-term, robust and reliable forecasts. This demonstrates that, despite their lower complexity, non-time-series models with well-chosen input variables can be as effective as time-series models when predicting for long time horizons.

Citation: Crisafulli P, del Río Mangada A, Sampedro JJS, Mirasso CR, Toral R, Galla T (2026) Forecasting emergency department visits in the reference hospital of the Balearic Islands: The role of tourist and weather data. PLoS One 21(3): e0343713. https://doi.org/10.1371/journal.pone.0343713

Editor: Tomo Popovic, University of Donja Gorica, MONTENEGRO

Received: August 1, 2025; Accepted: February 10, 2026; Published: March 13, 2026

Copyright: © 2026 Crisafulli et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data and code required to reproduce the study are available via a GitHub repository https://github.com/complexParide/son_espases (DOI: https://doi.org/10.5281/zenodo.18245781). This includes detailed shift-by-shift patient arrival numbers on each day in the training, validation and test periods, as well as daily patient arrivals in the low, medium and high risk groups. The input data for the prediction algorithm (calendar, population and weather) is also available in the repository.

Funding: The project that gave rise to these results received the support of a fellowship from ``la Caixa″ Foundation (ID 100010434, https://lacaixafoundation.org/en/). The fellowship code is LCF/BQ/DI22/11940041. This is an award to PC. We are also grateful for partial financial support from the Agencia Estatal de Investigacion and Fondo Europeo de Desarrollo Regional (FEDER, UE) under projects ID PID2021-122256NB-C21, PID2024-157493NB-C21 (supporting RT) and PID2021-122256NB-C22, PID2024-157493NB-C22 (supporting TG), funded by MICIU/AEI/10.13039/501100011033 (https://www.ciencia.gob.es/en/), and the Maria de Maeztu program for Units of Excellence CEX2021-001164-M (supporting PC, CM, RT, TG), funded by MICIU/AEI/10.13039/501100011033 (https://www.ciencia.gob.es/en/, https://www.aei.gob.es/). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Accurate forecasting of the volume of patient arrivals in hospital emergency departments (EDs) is of vital importance in ensuring effective resource allocation and timely patient care. Despite the considerable attention this issue has received worldwide through numerous studies [1–25], it remains evident that there is no universally superior prediction algorithm [1,2].

All predictive models are trained using the existing ED data at a given location. Once the training has been carried out, the prediction of the number of arrivals on a particular date may or may not rely on patient numbers previous to the prediction date. We refer to models that, after training, continue to use past data to make predictions as “time-series models.” For example, to make a prediction for a particular date, a fully trained time-series model requires actual data for patient arrivals on the days before the target date. Time-series models include auto-regressive integrated moving average (ARIMA) models and variants [1–18], seasonal exponential smoothing [1–18], Holt-Winters methods [1–5], and recurrent or convolutional neural networks [1,2,5,11,12,16,19]. Time-series models typically perform well for short-term forecasting but degrade over longer horizons; while exogenous variables can be incorporated [1], this does not necessarily lead to improved predictive performance.

Other, simpler, approaches use existing arrival data only during training, and subsequently rely solely on exogenous variables to make predictions. The most commonly used exogenous variables are calendar information (day of the week, month, holidays) [1,2,4,6,8,12–15,18,20–23], weather data [1,2,8,10,12,14,15,18,20–22], online searches for relevant keywords such as “flu” [15,24], and variables specific to a particular location (such as the timing of the Oktoberfest [25]). We refer to models that, after training, do not use past ED data for future predictions as “non-time-series models.” We note that these models might still require temporal information (e.g., the target date) as input, but the crucial difference to time-series models is that no patient arrival numbers, for example on days immediately preceding the target date, are required once the training of the model is complete. In that sense, these models do not extrapolate an existing time series. Algorithms used for such models include feedforward neural networks [1,2], random forests [1,15], and Poisson regression [20]. Although generally less accurate than time-series models for short prediction horizons, non-time-series approaches maintain constant accuracy across different prediction horizons (as far as the exogenous variables allow). This makes these models more suitable for long-term, i.e., fortnightly or monthly, resource allocation than time-series models.

In this paper, we use a non-time-series machine-learning approach to predict ED visits at the major hospital on the island of Mallorca (Balearic Islands, Spain), located in its capital, Palma. The island attracts a considerable number of visitors (tourists as well as temporary workers), and during the summer months, the population approximately doubles. Therefore, it seems prudent to include the floating population (including tourists) as an exogenous variable, alongside calendar variables (including local holidays), resident population, and weather data. In order to characterize the impact of the different variables on prediction accuracy, we run the models with different combinations of input variables.

We find that, despite its simplicity, non-time-series models can yield symmetric mean average percentage errors (SMAPE) that are comparable to those obtained from more complex models in previous studies [1–22,24,25]. To assess whether the differences in prediction errors between time-series models and non-time-series models are statistically meaningful, we conducted a series of Diebold–Mariano tests. The results show that, although models incorporating tourist population as an exogenous variable yield prediction errors that are statistically smaller than those of models without tourist information, the resulting change in the predicted number of incoming patients (NIP) remains small, below two patients for a typical hospital shift. By contrast, our analysis indicates that weather variables can be omitted, with calendar and population data providing sufficient exogenous information.

Materials and methods

Data preprocessing and behavior

We used a dataset comprising all ED visits at Hospital Universitari Son Espases (HUSE) in Palma, Mallorca, from December 26, 2015, to December 31, 2022. HUSE is the largest public hospital in the Balearic Islands. The study falls under the category of “human subjects research” in the PLOS classification. The Ethics Committee of the Balearic Islands (CEIm-IB) is the relevant review board, and the committee has confirmed that this study requires neither approval by the Ethics Committee nor patient consent, since the data used in this study are of an aggregated, anonymous and non-sensitive nature. The official document by the CEIm-IB (text in Spanish and English language) can be consulted online [26]; as no direct intervention on patients or collection of personally identifiable data was involved, the study raised no issues regarding individual privacy.

The dataset was first accessed on the 19th of January 2023. Patients from pediatrics and gynecology were not included in the dataset. Each entry describes one patient arrival and subsequent processing in the ED (see below). Entries with an unrealistic length of stay were eliminated, reducing the number of entries from 824,718–824,695. We classify a duration of stay as unrealistic if it is either negative or longer than 50 days. This latter cutoff was chosen after discussion with medical practitioners from Son Espases Hospital. Instead of the commonly used 24-hour resolution, we used the shifts of the hospital personnel in our analysis. These are the morning shift (8:00–15:00), the afternoon shift (15:00–21:00), and the night shift (21:00–8:00).

The 15 columns for each entry in the original dataset (see S1 Text Original data columns) were processed into the following 6:

Date and shift of entry at the ED.
Date and shift of exit from the ED.
Sex of the patient at the time of the visit.
Age of the patient at the time of the visit.
If the patient was a resident in the Balearic Islands or a non-resident. To be identified as a resident, they need to have both Spain as the country of residence (España as Pais residencia) and the Balearic Islands as province of residence (Baleares as Provincia residencia).
Whether the patient was hospitalized immediately from the ED, or not. To do this, we checked if the entry for “reason of discharge from the ED” was hospitalization (Motivo alta was Paso a hospitalización).

To explicitly avoid the impact of the COVID-19 pandemic, all entries from March 1, 2020, to December 31, 2021, were excluded. As seen in Fig 1, this period was characterized by a sharp decrease in the number of incoming patients (NIP), followed by a gradual recovery after the end of the pandemic. While we recognize a possible interest in these pandemic-related patterns, the analysis of this data is outside the scope of the current investigation. By excluding this period of time, the dataset was reduced to a total of 634,357 entries. We note that post-COVID patterns may differ from those of the pre-COVID era due to the lasting societal effects of the pandemic. For this reason, we divided our dataset into the following subsets:

Download:

Fig 1. Number of incoming patients (NIP) to the ED of Son Espases.

Each point corresponds to the NIP for a specific day and shift. Each subfigure shows a different shift (morning in green, afternoon in red, night in blue). The purple points between the dates March 1, 2020 and December 31, 2021 are the values registered during the assumed pandemic period, and excluded from our analysis.

https://doi.org/10.1371/journal.pone.0343713.g001

A training dataset spanning December 26th, 2015 to February 28th, 2018, used to train all models.
A validation dataset covering the period March 1st, 2018, to February 28th, 2019. This dataset was used to tune the hyperparameters of our models.
A test dataset covering the period March 1st, 2019, to February 29th, 2020. This dataset was used to evaluate the performance of our models.
A post-COVID dataset covering a time period after the pandemic (January 1st, 2022, to December 31st, 2022). This dataset was not used in our primary analysis, but as an additional test dataset to study the accuracy of the prediction model (trained and validated on pre-COVID data) in the post-COVID period.

We make the four datasets available on a GitHub repository [27], along with the code used for our study.

The NIP exhibits significant variation across the three daily shifts, as can be seen by comparing the three panels of Fig 1. The most significant difference across shifts is in the seasonal behaviour of the data over the course of the calendar year. This seasonality is evident for the night shift, and almost absent for the morning shift. It is worth noting that despite the oscillations the numbers of incoming patients grow steadily over time (see also Fig 1). We attribute this to an increase of the population.

The behavior of NIP for non-residents is very different than that for residents (S2 Fig. NIP as a function of time, residents vs non-residents). Because of the low number of non-residents among the incoming patients (5.15% of the total entries), we did not attempt to make predictions for non-residents as a separate class. Although Mallorca attracts a large number of tourists, the proportion of non-residents among the patients in the dataset is low, among other reasons, because non-residents tend to attend private hospitals.

As the data does not show any obvious differences between sexes in ED attendance patterns (S3 Fig. NIP as a function of time, females vs males), we do not attempt to predict sex-specific NIPs. 55.48% of all patients attending the ED were female and 44.52% male. The reasons for this imbalance are beyond the scope of this paper. Our dataset includes three patients without any information about their sex. We did not exclude these patients from the dataset as we did not perform any sex-specific analysis or prediction.

Age cohorts and hospitalization risk

Table 1 shows the most frequent reasons for a patient’s discharge from the ED. The primary concern for hospital resource allocation is whether an emergency department visit leads to hospitalization. As hospitalization risk varies with patient age, we provide separate predictions for the different risk groups defined below.

Download:

Table 1. Counts and percentages for reasons of discharge (Motivo alta). This table includes only reasons for discharge that appear in the dataset with a percentage higher than 0.01%.

https://doi.org/10.1371/journal.pone.0343713.t001

We define the hospitalization risk for a given patient cohort as the fraction of patients in the dataset who are admitted to the hospital following an emergency department visit, rather than being discharged. Fig 2 illustrates how increases with age. This age dependence motivates dividing patients into risk-based cohorts. We define the following classes:

Download:

Fig 2. Hospitalization risk h as a function of the patient’s age cohort.

h is the fraction of patients of a specific cohort who were hospitalized immediately after attending the ED. This is used to group patients in three different risk groups: low-risk group (h < 0.1, green line), medium-risk group (0.1 < h < 0.2, orange line), and high-risk group (h > 0.2, red line).

https://doi.org/10.1371/journal.pone.0343713.g002

Low-risk group (): Ages 15–54.
Medium-risk group (): Ages 55–74.
High-risk group (): Ages 75 and above.

Data for admission into hospital beds from the ED at HUSE has a resolution of 24 hours. Therefore, we make daily predictions of NIP for the different risk groups and not shift-based predictions.

Input variables and models

Our choice of exogenous input variables is based on a number of observations. For example, the NIPs of the afternoon and night shifts exhibit considerable dependence on the time of the year (Fig 3, upper right and bottom panels). We attribute this pattern to an increased overall population and engagement in risky activities during the summer months. On the contrary, arrivals during the mornings are largely independent of the time of the year (Fig 3, upper left panel). Additionally, the NIP is different on different days of the week, particularly for the morning and afternoon shifts (Fig 3, upper panels) being highest on Mondays, and lowest at the weekends. Based on these observations, we included calendar and population variables in the prediction models; additionally, we used weather data, following existing studies [1,2,8,10,12,14,15,18,20–22]. More precisely, the inputs for our models are as follows:

Download:

Fig 3. Average NIP in the different shifts for given weekdays and months.

The panels respectively show morning, afternoon, and night shifts. Markers show the average of all NIPs for a specific shift, month, and weekday. For example, the lower plot shows that the average NIP during a night shift on Wednesdays in July is 65 (light green triangle point). As in Fig 1, the seasonal behavior of NIP is more pronounced for the night shift and almost absent for the morning shift.

https://doi.org/10.1371/journal.pone.0343713.g003

Calendar variables: To predict outcomes on a specific date, our model uses the date itself (day, month and year) as input, along with the corresponding weekday. Additionally, we consider if any days within the five-day period surrounding the target date are designated holidays (national or local). Calendar variables were generated using the Python module workalendar [28], with the manual addition, when not already covered, of the local holidays in Palma or the Balearic Islands (in day/month format, these are 06/01, 20/01, 01/03, 01/05, 15/08, 12/10, 01/11, 06/12, 08/12, 26/12). The resulting input is therefore an array containing the following values: the number of days passed from the start of the dataset (an integer variable), the day of the year (integer from 0 to 365), the day of the week (integer from 0 to 6), and a bitstring of length 5 indicating which days on the five-day period surrounding the target date are holidays (5 values that are either 0 or 1).
Population variables: We use the resident population and the number of tourists, both in the Balearic Islands. The data was obtained from the website of the Instituto Nacional de Estadística [29] (the national statistics agency in Spain). Resident data have a resolution of 6 months, while tourist data have a monthly resolution. As shown in the left-hand panel of Fig 4, the resident population shows a monotonic growth in time. Daily estimates were obtained by performing a linear extrapolation of the past data. The number of tourists in the Balearic Islands shows strong periodicity (Fig 4, right-hand panel), so a simple linear extrapolation of all past data is not appropriate. Instead, to estimate the number of tourists on a particular date, we performed a linear interpolation between the following two values: (i) the number of tourists in the month preceding or containing the target date, and (ii) the number of tourists in the calendar month following the target date, but from the year before.
Weather variables: For a given target date, we incorporate the weather forecast published the day before the target, sourced from the Spanish meteorology agency Agencia Estatal de Meteorología (AEMET) [30]. The variables include maximum and minimum temperature, precipitation probability, and wind speed. The original data from AEMET consisted of text in natural language describing the predicted weather and integer numbers for the minimum and maximum temperatures. Precipitation probability and wind speed were extracted from the text as an intensity measure from 0 (absent) to 4 (very high intensity) by a local generative AI model. We employed the DeepSeek-R1 model [31] through a Python implementation. The prompt fed to the model can be found in the supplementary material (S4 Text DeepSeek prompt to preprocess weather data). This procedure does not introduce any target leakage. Since the data by AEMET already has a daily resolution, no interpolation was required.

Download:

Fig 4. (a) Number of residents in the Balearic Islands per age-group.

The data was sourced from the Instituto Nacional de Estadística website and has a six-month resolution. (b) Number of monthly incoming tourists in the Balearic Islands, same source.

https://doi.org/10.1371/journal.pone.0343713.g004

The dataset of the preprocessed exogenous variables can be found in our repository [27].

Three machine-learning models were used to make predictions for the number of incoming patients. These were implemented using either PyTorch [32] or scikit-learn [33] Python libraries. All input data were preprocessed with scikit-learn’s function StandardScaler, which standardizes features by removing the mean and scaling to unit variance. We tested the following models on the data:

Random forest (RF): an ensemble learning method that constructs a number of decision trees during training, and, for a given input, returns the average prediction across trees. The term “random” indicates the use of random subsets of the training data for the construction of each tree and the fact that random subsets of features are considered at each split in the trees. This enhances the model’s generalization ability, accuracy, and robustness [33]. We set the parameter n_estimators of scikit-learn’s function RandomForestRegressor, which is the number of trees in the forest, to 100. To choose this value, we measured the performance of the RF model (see the Section Performance metrics and bootstrapping for more details about performance metrics) on the validation dataset for values of the n_estimators hyperparameter in the interval from 25 to 200 with a spacing of 25, and then selected the value for which the model performed best. The performance of the RF method for different values of n_estimators is shown in the supplementary material (S5 Fig. Hyperparameter tuning for RF and SVR models, left-hand panel).
Support vector regressors (SVR): a type of support vector machine adapted for regression, whose goal is to predict continuous rather than categorical outputs. We set the parameter degree of scikit-learn’s function SVR, representing the degree of the polynomial kernel function, to 3. As we did for the RF method, to choose this value, we measured the performance of the SVR model (see the Section Performance metrics and bootstrapping for more details about performance metrics) on the validation dataset for integer values of the degree hyperparameter from one to ten, and then selected the value for which the model performed best. The performance of the SVR method for different values of degree is shown in the supplementary material (S5 Fig. Hyperparameter tuning for RF and SVR models, right-hand panel). We left the default kernel set to RBF.
Feedforward neural network (FNN): a basic artificial neural network in which information flows in one direction, from input to output. It consists of interconnected nodes of consecutive layers, where each node applies an activation function. The network is trained to make predictions using backpropagation, adjusting weights to minimize prediction errors. FNNs are commonly used in machine learning for tasks such as classification and regression. For our models, we used a 14-layer structure, implemented with the PyTorch module [32]. Details of the implementation can be found in S6 Text FNN detailed structure. The FNN was trained for 130 epochs. To choose this value, we trained the model for 500 epochs, measuring at each epoch the loss function on both the training and the validation datasets. The minimum loss function on the validation dataset is obtained at 130 epochs. A figure showing the loss function curves can be found in the supplementary material (see S7 Fig. Hyperparameter tuning for the FNN model).

For all three models, we used four different combinations of input variables: (i) all variables (calendar variables, resident and tourist populations, weather forecasts), (ii) all variables, except the weather forecasts, (iii) all variables, except the tourist population, and (iv) only calendar variables and the resident population, i.e., excluding both weather forecasts and information on tourists. In the following text, the combination of input variables is denoted as a suffix following the name of the corresponding model. ‘All’, ‘No W’, ‘No T’, and ‘No W-No T’ indicate the combinations (i) to (iv) respectively. For example, the random forest method that uses all variables except weather forecasts is denoted as RF-No W, while the feedforward neural network including all input variables (calendar, weather forecasts, and tourist data) is denoted as FNN-All.

To compare our non-time-series models to commonly used time-series models, we implemented a seasonal autoregressive integrated moving average (SARIMA) model [34] without exogenous variables. This model was parameterized as SARIMA. The non-seasonal orders were selected as a minimal configuration to account for a linear trend and short-term temporal dependencies. The seasonal orders with a period of were specified to capture the weekly patterns evident from Fig 3.

We also implement a SARIMAX model with the same parametrization using calendar and population data as exogenous variables. We exclude weather data for this model for two reasons: (i) only 1-day weather forecasts are included in our study, thus it is not clear how to construct a meaningful SARIMAX model for long prediction horizons which would include weather data; (ii) as shown in the results section, weather data turned out not to be statistically significant to improve the quality of the predictions and can thus be discarded as an input variable.

Both the SARIMA and SARIMAX models were implemented using the Python module statsmodels [35]. In order to make a prediction for a specific date in the test period, the models were trained using the patient arrival numbers from the 365 days preceding this target date. The full implementation of these models, including seeds and version details, can be found in our repository [27].

In the figures, tables, and the following text, we denote as SARIMA- and SARIMAX- the SARIMA and the SARIMAX models described in the previous section with a prediction horizon of days. We compare their predictions with our non-time-series models for , , and days in the future, to test if our models perform better than SARIMA and SARIMAX as the prediction horizon increases. From existing literature [1–18], we expect the SARIMA model to outperform our three models for short prediction horizons, but to gradually lose accuracy as the horizon increases. Ultimately, we expect that the SARIMA models will be outperformed by the non-time-series models, or at the very least that the accuracy of both types of models will become comparable. We anticipate broadly similar behavior for SARIMAX; however, overfitting effects may adversely affect its predictive accuracy, even over short time horizons.

Performance metrics and bootstrapping

The accuracy of the predictive models is evaluated using the symmetric mean absolute percentage error (SMAPE):

(1)

where is the actual NIP for a day and shift (or risk group), is the predicted NIP for that shift (or risk group) on that day, and is the number days in the test dataset. SMAPE is a metric similar to MAPE that avoids issues of dividing by zero. This is suitable for our dataset as it is possible to have no incoming patients on a given day, especially for the night shift or if the analysis is restricted to cohort of high-risk patients. The SMAPE was calculated for 1000 bootstrap samples. Each of these bootstrap samples was generated by sampling random days from the test dataset with equal probability and with replacement. Details concerning random seeds and sampling functions can be found in our repository [27]. We also employ the root-mean-squared error (RMSE) and the mean-average-error (MAE) as secondary performance measures:

(2)

For specific pairs of models, we perform a Diebold–Mariano (DM) test [36] to determine if the costs associated with prediction errors from the two models are statistically different from one another. As the cost function we used the symmetric mean average percentage error (SMAPE) as it is the main performance metric used in the results section. The test provides a p-value for the null hypothesis that the two forecast models have an equivalent average cost. If the null hypothesis is accepted at the significance level we say that the two models have equal predictive accuracy. Since we are testing a total of 243 null hypotheses with these DM tests, we apply a Bonferroni correction for multiple-hypothesis testing [37]. For clarity, we note that equal predictive accuracy as determined by the DM test is not transitive, that is to say if model A is identified as equivalent to model B by the test, and model B as equivalent to model C, then models A and C need not be equivalent to one another [36].

For a given pair of prediction models, DM tests are run for each bootstrap sample, returning a total of 1000 p-values for that pair of models. We measured the degree of equivalence between the two models by the fraction of the DM test results among those 1000 bootstrap samples that led to the acceptance of the null hypothesis of equal predictive accuracy between the models. In future sections, we state that two models have equal predictive accuracy if they satisfy the null hypothesis in more than 75% of the samples. The Bonferroni-corrected Diebold–Mariano test enforces strong control of false positives within each sample. The additional requirement that the null be rejected in at least 75% of resamples serves as a complementary robustness criterion, ensuring that statistical significance is not driven by sample-specific variability, while avoiding the overly restrictive requirement of near-uniform rejection that would be incompatible with the finite-sample power of the DM test.

Results

Non-time-series models

Table 2 shows the SMAPEs for each patient cohort, both for shift-based and risk-group-based predictions. The full set of performance metrics (SMAPE, RMSE, MAE) and the associated errors over the bootstrap samples can be found in the supplementary material (S8 Table Full set of SMAPEs, RMSEs, and MAEs). The values in Table 2 are the mean over the 1000 bootstrap samples. In this subsection, we focus on the upper half of the table, showing the SMAPEs for the non-time-series models. For each shift and risk group, the non-time-series model with the lowest SMAPE is always an RF model.

Download:

Table 2. SMAPE (Eq. 1) for the different prediction methods. Each row corresponds to a specific method. The RF, SVR, and FNN models were run with different combinations of input data, as explained in the text. ‘All’ indicates all input variables (calendar, resident and tourist population, weather forecast). ‘No W’ means that weather data was excluded, ‘No T’ means that resident and tourist populations were excluded. The last eight rows are for the SARIMA-

and SARIMAX-

time-series models, where the integer number (n = 1, 7, 14, 28) indicates the prediction horizon in days. Columns 3 to 5 are the SMAPEs for the predictions of the total NIP (across all age groups) in the different shifts. The last three columns show the SMAPEs for the three age cohorts discussed in the section on Age-specific predictions. Blue cells indicate that, according to the DM test, a model has the same prediction accuracy as the RF-No W model for that shift or risk group in at least 90% of the bootstrap samples. Purple cells indicate that the same condition is satisfied for both the RF-No W and RF-No W-No T models. The focus on these two models is justified in the Results section.

https://doi.org/10.1371/journal.pone.0343713.t002

Fig 5 highlights which pairs of models have equal predictive accuracy according to the DM test among the non-time-series models. The full set of results of the DM tests, including the percentages of bootstrap samples that satisfy the null hypotheses, can be found in the Supplementary Material (S9 Table Full set of DM p-values). Our goal is to select the simplest model, both in terms of input variables and method, that either has the lowest SMAPE for all shifts or has equal predictive accuracy to the one that satisfies this condition. SVRs always perform worse than RF models (SVRs are found to have a higher SMAPE and never have equal predictive accuracy as the RF with the lowest SMAPE). For some particular input combinations and shifts (or risk groups), FNNs have equal predictive accuracy as the RF with the lowest SMAPE. Nevertheless, FNNs are more computationally expensive than RFs, so we focus only on RF models for the rest of the subsection.

Download:

Fig 5. This figure highlights which combinations of non-time-series models and input variables have equal predictive accuracy according to the DM test (shift-based predictions in panel (a) and risk-group-based predictions in panel.

(b)) Each white disk corresponds to a different combination, as indicated on the axes. Since the test is symmetric, we only show each comparison once and hence the lower-right part of each diagram is empty. The colored circles inside the white disks indicate for which shifts or risk groups the two models represented by the disk have equal predictive accuracy for at least 75% of bootstrap samples.

https://doi.org/10.1371/journal.pone.0343713.g005

In what follows, we analyse the results shift by shift and risk group by risk group, starting with the morning shift. For the latter, RF-No W yields the lowest SMAPE, suggesting that the inclusion of weather data induces overfitting. Furthermore, RF-No W exhibits equal predictive accuracy to that of all other RF models. In particular, further excluding tourist variables does not lead to a statistically significant loss of accuracy. Therefore, both sets of variables can be safely omitted.

For the afternoon shift, RF-All achieves the lowest SMAPE but has a predictive accuracy equal to that of RF-No W. This means that the increase in SMAPE caused by excluding weather information is not statistically relevant. Nevertheless, tourist data significantly improve the accuracy according to the DM test.

RF-All again achieves the lowest SMAPE for the night shift, although its predictive accuracy does not differ significantly from that of RF-No W and RF-No W-No T. We also find that RF-No T performs significantly worse, suggesting that weather data causes overfitting in the absence of tourist variables. By contrast, the inclusion of weather data slightly improves the SMAPE, but this improvement is not statistically significant according to the DM test.

We next focus on risk-group-based predictions. For both the low- and medium-risk groups, RF-All attains the lowest SMAPE. However, its predictive accuracy is not statistically different from that of the models excluding weather or tourist variables, mirroring the pattern observed for the morning and night shifts. Therefore, for these risk groups, both sets of variables can be excluded without a statistically significant loss in predictive performance.

For the high-risk group, the situation changes. RF-No T attains the lowest SMAPE. However, according to the DM test, its predictive accuracy is equal to that of RF-No W, and significantly smaller than that of RF-All (see panel b of Fig 5). This indicates that combining weather and tourist data (RF-All) leads to overfitting for this risk group. Since weather forecasts are available only one day ahead, whereas tourist data can be predicted on a monthly scale, RF-No W emerges as the most practical and robust choice for this group.

Summarising, for all shifts and risk groups, RF-No W proved to be either the model with the lowest SMAPE or to have equal predictive accuracy as the one with the lowest SMAPE. Therefore, the inclusion of weather data does not significantly improve prediction accuracy for any shift or risk group and can be discarded as an input variable.

Except for the afternoon shift and the high-risk group, RF-No W-No T also proved to be either the model with the lowest SMAPE or to have equal predictive accuracy as the one with the lowest SMAPE. Therefore, unlike weather data, the inclusion of tourist data significantly improves prediction accuracy for the afternoon shift and the high-risk group.

We also notice that the difference in SMAPE between the RF-No W models and RF-No W-No T is always lower than 0.5. Since the daily NIP for any shift or risk group is always lower than 250 patients, this difference in SMAPE corresponds to less than two patients per shift or risk group. This quantity, although statistically significant according to the DM test, is small in terms of resource allocation planning. This suggests that, while the inclusion of tourist data has a statistically significant effect, its overall impact is much weaker than implied by public perception and media narratives [38,39]. This does not mean that tourists have no effect on patient inflow; rather, because of their strong seasonality, much of the information contained in tourist data is already captured by calendar variables.

Fig 6 illustrates how predictions of the RF-No W model align with shift-based data in two distinct time windows (May-June 2019 in panel (a) and November-December 2019 in panel (b)). The points represent the actual NIP, while the colored bars indicate the RMSE around the predicted value (, , for the morning, afternoon, and night shift respectively). Fig 7 illustrates how predictions of the RF-No W align with risk-group-based data in the same two time windows. The points represent again the actual NIP, while the colored bars indicate the RMSE around the predicted value (, , for the low, medium, and high risk groups respectively). The selected model, RF-No W, relies exclusively on calendar and population variables, both resident and tourist, which enables the prediction of NIP for any future date, provided that no major events disrupt tourism or population growth patterns, such as the COVID-19 pandemic.

Download:

Fig 6. The figures illustrate how predictions of the optimal model (RF-No W) align with true data in two distinct time windows (May-June in panel (a) and November-December in panel.

(b)). The points represent the actual NIP, while the colored bars indicate the RMSE around the predicted value.

https://doi.org/10.1371/journal.pone.0343713.g006

Download:

Fig 7. These figures illustrate how predictions of the optimal model align with true data in two distinct time windows (May-June in panel (a) and November-December in panel.

(b)). The points represent the actual NIP, while the colored bars indicate the RMSE around the predicted value.

https://doi.org/10.1371/journal.pone.0343713.g007

Comparison with time-series models

The goal of this section is to compare the RF-No W and RF-No W-No T models with simple time-series models. As discussed in the Methods section, we compare our models with SARIMA and SARIMAX because these models are widely used in the relevant literature [1–18]. We parametrized these models as SARIMA and SARIMAX to represent a simple model that captures weekly oscillations in the NIP. Since the goal of this paper is to assess which external variables are relevant for non-time-series models for the prediction of NIP over long horizons we avoid a detailed search for the optimal model in the ARIMA family. We compare SARIMA and SARIMAX to both RF-No W and RF-No W-No T rather than only to RF-No W. This is because RF-No W-No T could also be a practical option for ED managers, as it avoids the use of the additional variable tourist data. We note though that the difference between their SMAPEs is statistically significant.

We now focus on the lower half of Table 2, showing the SMAPEs of time-series models. The full set of performance metrics (SMAPE, RMSE, MAE) and the associated errors over the bootstrap samples can be found in the supplementary material (S8 Full set of SMAPEs, RMSEs, and MAEs). As expected, the SMAPEs of both SARIMA and SARIMAX increase with the prediction horizon. For all shifts and risk groups, the SMAPEs of SARIMAX are higher than the ones for SARIMA. This means that, for autoregressive models like SARIMAX, the inclusion of the exogenous variables lead to overfitting. Therefore, we will focus on SARIMA (no exogenous variables) for the rest of the study.

Fig 8 highlights which pairs among the following models have equal predictive accuracy according to the DM test: SARIMA with predictive horizons (1, 7, 14, 28), RF-No W, and RF-No W-No T. The full set of results of the DM tests, including the exact percentages of samples that satisfy the null hypotheses, can be found in the Supplementary Material (S9 Table Full set of DM p-values). Since the SMAPE of the SARIMA model increases with the prediction horizon, we determine from which horizon onward the predictive accuracy of SARIMA is equal to or below those of RF-No W and RF-No W-No T respectively.

Download:

Fig 8. This figure highlights which pairs of the following models have equal predictive accuracy according to the DM test (shift-based predictions in panel (a) and risk-group-based predictions in panel (b)): SARIMA with predictive horizons (1, 7, 14, 28), RF-No W, and RF-No W-No T.

Each white disk corresponds to a different combination, as indicated on the axes. Since the test is symmetric, we only show each comparison once and hence the lower-right part of each diagram is empty. The colored circles inside the white disks indicate for which shifts or risk groups the two models represented by the disk have equal predictive accuracy for at least 75% of the bootstrap samples.

https://doi.org/10.1371/journal.pone.0343713.g008

We start our discussion from the longest horizon. The predictive accuracy of SARIMA-28 is equal to that of both RF-No W and RF-No W-No T for any shift and risk group (Fig 8). From this we expect that, for any longer horizon, the two RF models will have either equal or higher predictive accuracy than SARIMA.

SARIMA-14 outperforms RF-No W-No T across all risk groups and for the morning shift, but has equal predictive accuracy as RF-No W for all cases. Thus, RF-No W already matches SARIMA in accuracy when forecasting 14 days ahead. This is also the case for SARIMA-7, excluding the morning shift and the low-risk group. This means that a simple model such as RF can achieve an equivalent accuracy as SARIMA, relying only on exogenous variables that are accessible or predictable.

Testing the models on post-COVID data

Making predictions of NIP in the post-COVID period would ideally involve retraining the models on post-COVID data. However, there is currently no sufficient post-COVID data (our dataset ends in December 2022, leaving us with only one year of post-COVID patient arrival numbers). For this reason, we limit ourselves to testing our models on post-COVID data without any retraining. In other words, we use the models trained on pre-COVID data as described above to make predictions for the post-COVID period. Our aim is to assess whether, and to what extent, model performance deteriorates as a result of the pandemic-induced changes in NIP patterns. Among the three non-time-series models, we only focus on the random forest model, since the latter outperformed both support vector regressors and feedforward neural networks in the pre-COVID period. For pre-COVID testing, the SARIMA model was trained on the year preceding the test dataset. This is impossible to do for post-COVID testing, since the year preceding the test dataset is the pandemic period. Thus, we tested the performance of the RF models and compared them with their performance during the pre-COVID period.

The full set of SMAPEs, RMSEs, and MAEs for the RF models tested on post-COVID data can be found in the Supplementary Material (S10 Table Full set of SMAPEs, RMSEs, and MAEs (post-COVID)). The SMAPEs worsen considerably compared to their pre-COVID values for all shifts and risk groups, excluding the night shift. For post-COVID data, we cannot identify a single optimal model. Different input combinations make the RF perform better on different shifts and risk groups, and all SMAPEs are compatible with each other according to their associated errors across the four input combinations.

In conclusion, our results highlight the importance of retraining the models using post COVID patient numbers as soon as enough data has been collected.

Conclusions

In this study, we have developed a non-time-series approach to predict patient volumes in emergency departments using exogenous variables and simple machine-learning techniques. At variance with time-series models, in the non-time-series models any increase in the prediction error arises solely from the inclusion of time-dependent exogenous variables, such as weather, or resident and tourist population. In contrast to weather, the resident and tourist populations are easily predictable on a fortnightly and monthly scale, a useful horizon for hospital resource allocation.

We found that random forests outperformed support vector regressors and feedforward neural networks for all shifts and risk groups. The RF-No W model, the random forest model using only calendar and population variables as inputs, was identified as the optimal model across the examined non-time-series models to predict the number of incoming patients for any shift or risk group. This choice was made by noting that RF-No W, for any shift or risk group, was either the model with the lowest SMAPE or had equivalent accuracy to the model with the lowest SMAPE.

The results allowed us to exclude weather forecasts as input variables. This is in contrast with a number of other studies [1,2,8,10,12,14,15,18,20–22], where weather proved to be a necessary exogenous variable for accurate forecasting of the number of incoming patients at emergency departments. We believe that our result is likely due to Mallorca’s seasonal and mild weather, for which calendar data is already a good substitute as an input variable.

We found the difference in SMAPE between the RF-No W and RF-No W-No T models always to be below 0.5. Since the daily NIP for each shift and risk group does exceeds 250 patients at the hospital we studied, this difference in SMAPE corresponds to fewer than two patients per shift or risk group per day. Although this difference is statistically significant according to the DM test, it is small from a practical point of view and has a limited impact on resource allocation planning. Therefore, the inclusion of the tourist population variable does not have the strong effect that might be expected given how the phenomenon is described by residents and mainstream media [38,39]. This does not imply that tourism has no effect on patient arrivals; rather, due to its strong seasonal pattern, most of the information provided by tourist data is already captured by calendar variables. RF-No W-No T therefore remains a viable candidate for accurate predictions when tourist data is unavailable.

We also compared RF-No W and RF-No W-No T to time-series models at different predictive horizons ranging from 1 to 28 days. Because of the overfitting caused by calendar and tourist data, SARIMA proved to be more accurate than SARIMAX for each shift, risk group, and prediction horizon. RF-No W showed equal prediction accuracy than SARIMA for each shift and risk group for horizons of 14 and 28 days. This implies equal or better prediction accuracy for any horizon longer than 14 days. We acknowledge that the SARIMA model we used could be improved with a better choice of parameters, but our current analysis is sufficient to demonstrate that relatively simple non-time-series models show comparable performance as some of the time-series models frequently employed in the literature [1–18]. We also stress that any time-series model, including SARIMA, would frequently have to be fed with new data to maintain high accuracy. Non-time-series models, once fed with population and calendar variables (both predictable with low uncertainty), do not need to be retrained with new data unless major external events alter the NIP patterns (i.e., a pandemic or a long-lasting calamity). For this reason, while time-series models are preferable for day-to-day predictions, the optimal model we have identified (RF-No W) is a viable alternative for long-term resource and personnel allocation.

A further conclusion of our work concerns the permanent impact of the pandemic on ED admission patterns. As shown in the Supplementary Material (S10 Table Full set of SMAPEs, RMSEs, and MAEs (post-COVID)), the SMAPEs of the Random Forest models, trained on pre-COVID data, increase significantly when tested on post-COVID data (night shift excluded). To overcome this issue, the models would have to be retrained on post-COVID data at a suitable point in the future, when there will be sufficient post-COVID data for adequate training, validation, and testing.

Although our methods can – in principle – be applied to other hospitals, we cannot expect our conclusions to be valid at all geographic locations. However, we believe that analogous results would likely be found in regions with similar weather and tourist-flow patterns (and provided hospital procedures are similar).

Supporting information

S1 Text. Original data columns.

https://doi.org/10.1371/journal.pone.0343713.s001

(PDF)

S2 Fig. NIP as a function of time, residents vs non-residents.

Subfigures on the left show the NIP for residents, and subfigures on the right show the NIP for non-residents. Each point corresponds to the NIP for a specific day and shift. Each subfigure shows a different shift (morning in green, afternoon in red, night in blue). The purple points between the dates March 1, 2020 and December 31, 2021 are the values registered during the assumed pandemic period, and excluded from our analysis.

https://doi.org/10.1371/journal.pone.0343713.s002

(PDF)

S3 Fig. NIP as a function of time, females vs males.

Subfigures on the left show the NIP for female patients, while subfigures on the right show the NIP for male patients. Each point corresponds to the NIP for a specific day and shift. Each subfigure shows a different shift (morning in green, afternoon in red, night in blue). The purple points between the dates March 1, 2020 and December 31, 2021 are the values registered during the pandemic, and excluded from our analysis. The behaviour of the curves on the left and right respectively are very similar to one another, thus eliminating the need to develop a different model for each sex.

https://doi.org/10.1371/journal.pone.0343713.s003

(PDF)

S4 Text. DeepSeek prompt to preprocess weather data.

https://doi.org/10.1371/journal.pone.0343713.s004

(PDF)

S5 Fig. Hyperparameter tuning for RF and SVR models.

Subfigure on the left shows the SMAPE for the validation dataset obtained from the RF model for different values of the n_estimators hyperparameter. Subfigure on the right shows the SMAPE for the validation dataset obtained from the SVR model for different values of the degree hyperparameter. The vertical grey lines indicate the optimal values.

https://doi.org/10.1371/journal.pone.0343713.s005

(PDF)

S6 Text. FNN detailed structure.

Here nn corresponds to the module torch.nn. The main hyperparameter is the number of training epochs. To tune this hyperparameter, we ran the model for 500 epochs and checked the loss function of both training and validation sets at each epoch. In order to minimize overfitting, we set the number of epochs to 130 (see S7 Fig. Hyperparameter tuning for the FNN model).

https://doi.org/10.1371/journal.pone.0343713.s006

(PDF)

S7 Fig. Hyperparameter tuning for the FNN model.

The figure shows the loss function at different training epochs for both the training (blue curve) and validation (red curve) dataset. The dashed vertical grey line indicates the optimal value, corresponding to the minimum of the validation curve.

https://doi.org/10.1371/journal.pone.0343713.s007

(PDF)

S8 Table. Full set of SMAPEs, RMSEs, and MAEs.

Performance metrics (SMAPE, RMSE, MAE) and associated standard deviations across models and input variables. The metrics have been averaged across 1000 bootstrap samples. First table: shift-based predictions. Second table: risk-group-based predictions.

https://doi.org/10.1371/journal.pone.0343713.s008

(PDF)

S9 Table. Full set of DM p-values.

For each pair of models (with the respective input variables) in the table, we report the fraction of bootstrap samples for which the two modes have equivalent predictive accuracy according to the Diebold-Mariano test. First table: shift-based predictions. Second table: risk-group-based predictions.

https://doi.org/10.1371/journal.pone.0343713.s009

(PDF)

S10 Table. Full set of SMAPEs, RMSEs, and MAEs (post-COVID).

Model performance on the post-COVID dataset. The model tested is the RF model for all four input combinations. First table: shift-based predictions. Second table: risk-group-based predictions.

https://doi.org/10.1371/journal.pone.0343713.s010

(PDF)

Acknowledgments

We acknowledge Noemi Pérez García and the Observatori de Dades Sanitàies, Serveis Centrals de Salut de les Illes Balears for providing the data used in this study. We also acknowledge the regional office of the State Meteorological Agency (AEMET) in the Balearic Islands for providing the weather forecast data.

References

1. Silva E, Pereira MF, Vieira JT, Ferreira-Coimbra J, Henriques M, Rodrigues NF. Predicting hospital emergency department visits accurately: A systematic review. Int J Health Plann Manage. 2023;38(4):904–17. pmid:36898975
- View Article
- PubMed/NCBI
- Google Scholar
2. Sudarshan VK, Brabrand M, Range TM, Wiil UK. Performance evaluation of Emergency Department patient arrivals forecasting models by including meteorological and calendar information: A comparative study. Comput Biol Med. 2021;135:104541. pmid:34166880
- View Article
- PubMed/NCBI
- Google Scholar
3. Abraham G, Byrnes GB, Bain CA. Short-term forecasting of emergency inpatient flow. IEEE Trans Inf Technol Biomed. 2009;13(3):380–8. pmid:19244023
- View Article
- PubMed/NCBI
- Google Scholar
4. Aboagye-Sarfo P, Mai Q, Sanfilippo FM, Preen DB, Stewart LM, Fatovich DM. A comparison of multivariate and univariate time series approaches to modelling and forecasting emergency department demand in Western Australia. J Biomed Inform. 2015;57:62–73. pmid:26151668
- View Article
- PubMed/NCBI
- Google Scholar
5. Choudhury A, Urena E. Forecasting hourly emergency department arrival using time series analysis. British Journal of Healthcare Management. 2020;26(1):34–43.
- View Article
- Google Scholar
6. Hertzum M. Forecasting Hourly Patient Visits in the Emergency Department to Counteract Crowding. Open Ergonomics J. 2017;10(1):1–13.
- View Article
- Google Scholar
7. Cheng Q, Argon NT, Evans CS, Liu Y, Platts-Mills TF, Ziya S. Forecasting emergency department hourly occupancy using time series analysis. Am J Emerg Med. 2021;48:177–82. pmid:33964692
- View Article
- PubMed/NCBI
- Google Scholar
8. Kam HJ, Sung JO, Park RW. Prediction of Daily Patient Numbers for a Regional Emergency Medical Center using Time Series Analysis. Healthc Inform Res. 2010;16(3):158–65. pmid:21818435
- View Article
- PubMed/NCBI
- Google Scholar
9. Sun Y, Heng BH, Seow YT, Seow E. Forecasting daily attendances at an emergency department to aid resource planning. BMC Emerg Med. 2009;9:1. pmid:19178716
- View Article
- PubMed/NCBI
- Google Scholar
10. Marcilio I, Hajat S, Gouveia N. Forecasting daily emergency department visits using calendar variables and ambient temperature readings. Acad Emerg Med. 2013;20(8):769–77. pmid:24033619
- View Article
- PubMed/NCBI
- Google Scholar
11. Rocha CN, Rodrigues F. Forecasting emergency department admissions. J Intell Inf Syst. 2021;56(3):509–28.
- View Article
- Google Scholar
12. Xu Q, Tsui K, Jiang W, Guo H. A Hybrid Approach for Forecasting Patient Visits in Emergency Department. Quality & Reliability Eng. 2016;32(8):2751–9.
- View Article
- Google Scholar
13. Yucesan M, Gul M, Celik E. A multi-method patient arrival forecasting outline for hospital emergency departments. International Journal of Healthcare Management. 2018;13(sup1):283–95.
- View Article
- Google Scholar
14. Álvarez-Chaves H, Muñoz P, R-Moreno MD. Machine learning methods for predicting the admissions and hospitalisations in the emergency department of a civil and military hospital. J Intell Inf Syst. 2023;61(3):881–900.
- View Article
- Google Scholar
15. Hu Y, Cato KD, Chan CW, Dong J, Gavin N, Rossetti SC, et al. Use of Real-Time Information to Predict Future Arrivals in the Emergency Department. Ann Emerg Med. 2023;81(6):728–37. pmid:36669911
- View Article
- PubMed/NCBI
- Google Scholar
16. Jilani T, Housley G, Figueredo G, Tang P-S, Hatton J, Shaw D. Short and Long term predictions of Hospital emergency department attendances. Int J Med Inform. 2019;129:167–74. pmid:31445251
- View Article
- PubMed/NCBI
- Google Scholar
17. Juang W-C, Huang S-J, Huang F-D, Cheng P-W, Wann S-R. Application of time series analysis in modelling and forecasting emergency department visits in a medical centre in Southern Taiwan. BMJ Open. 2017;7(11):e018628. pmid:29196487
- View Article
- PubMed/NCBI
- Google Scholar
18. Calegari R, Fogliatto FS, Lucini FR, Neyeloff J, Kuchenbecker RS, Schaan BD. Forecasting Daily Volume and Acuity of Patients in the Emergency Department. Comput Math Methods Med. 2016;2016:3863268. pmid:27725842
- View Article
- PubMed/NCBI
- Google Scholar
19. Sharafat AR, Bayati M. PatientFlowNet: A Deep Learning Approach to Patient Flow Prediction in Emergency Departments. IEEE Access. 2021;9:45552–61.
- View Article
- Google Scholar
20. McCarthy ML, Zeger SL, Ding R, Aronsky D, Hoot NR, Kelen GD. The challenge of predicting demand for emergency department services. Acad Emerg Med. 2008;15(4):337–46. pmid:18370987
- View Article
- PubMed/NCBI
- Google Scholar
21. Wargon M, Casalino E, Guidet B. From model to forecasting: a multicenter study in emergency departments. Acad Emerg Med. 2010;17(9):970–8. pmid:20836778
- View Article
- PubMed/NCBI
- Google Scholar
22. Erkamp NS, van Dalen DH, de Vries E. Predicting emergency department visits in a large teaching hospital. Int J Emerg Med. 2021;14(1):34. pmid:34118866
- View Article
- PubMed/NCBI
- Google Scholar
23. Graham B, Bond R, Quinn M, Mulvenna M. Using Data Mining to Predict Hospital Admissions From the Emergency Department. IEEE Access. 2018;6:10458–69.
- View Article
- Google Scholar
24. Ekström A, Kurland L, Farrokhnia N, Castrén M, Nordberg M. Forecasting emergency department visits using internet data. Ann Emerg Med. 2015;65(4):436-442.e1. pmid:25487026
- View Article
- PubMed/NCBI
- Google Scholar
25. Ghada W, Estrella N, Pfoerringer D, Kanz K-G, Bogner-Flatz V, Ankerst DP, et al. Effects of weather, air pollution and Oktoberfest on ambulance-transported emergency department admissions in Munich, Germany. Sci Total Environ. 2021;755(Pt 2):143772. pmid:33229084
- View Article
- PubMed/NCBI
- Google Scholar
26. CEIm-IB document for ethical approval. https://csv.caib.es/concsvfront/view.xhtml?hash=f5ab8ef62789149a8df0adc5574a934e67383035268332413314cdeae817b67b
27. Crisafulli P. ComplexParide/son_espases: Predicting the number of incoming patients in the ED of Son Espases University Hospital (HUSE). Zenodo. 2026.
- View Article
- Google Scholar
28. Workalendar: Worldwide holidays and workdays computational toolkit. https://workalendar.github.io/workalendar/
29. Instituto Nacional de Estadística. Instituto Nacional de Estadística. https://www.ine.es
30. Agencia Estatal de Meteorología (AEMET). https://opendata.aemet.es
31. Guo D. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv. 2025.
- View Article
- Google Scholar
32. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, 2019. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
- View Article
- Google Scholar
33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(85):2825–30.
- View Article
- Google Scholar
34. Wilson GT. Time Series Analysis: Forecasting and Control, 5th Edition, by Box GEP, Jenkins GM, Reinsel GC, Ljung GM. John Wiley and Sons Inc.: Hoboken, New Jersey. 712. ISBN: 978‐1‐118‐67502‐1. Journal of Time Series Analysis. 2016 Mar;37(5):709–11. Available from: http://dx.doi.org/10.1111/jtsa.12194
- View Article
- Google Scholar
35. Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with python. In: 2010.
- View Article
- Google Scholar
36. Diebold FX, Mariano RS. Comparing Predictive Accuracy. Journal of Business & Economic Statistics. 1995;13(3):253–63.
- View Article
- Google Scholar
37. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità. Firenze: Seeber. 1936.
38. Los turistas llenan Urgencias al dejar de ser derivados a las clínicas privadas. 2023. https://www.ultimahora.es/noticias/local/2023/07/04/1969045/sanidad-baleares-turistas-llenan-urgencias.html
39. Colapso en las urgencias de Magaluf por turistas borrachos. 2019.

[ref1] 1. Silva E, Pereira MF, Vieira JT, Ferreira-Coimbra J, Henriques M, Rodrigues NF. Predicting hospital emergency department visits accurately: A systematic review. Int J Health Plann Manage. 2023;38(4):904–17. pmid:36898975
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Sudarshan VK, Brabrand M, Range TM, Wiil UK. Performance evaluation of Emergency Department patient arrivals forecasting models by including meteorological and calendar information: A comparative study. Comput Biol Med. 2021;135:104541. pmid:34166880
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Abraham G, Byrnes GB, Bain CA. Short-term forecasting of emergency inpatient flow. IEEE Trans Inf Technol Biomed. 2009;13(3):380–8. pmid:19244023
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Aboagye-Sarfo P, Mai Q, Sanfilippo FM, Preen DB, Stewart LM, Fatovich DM. A comparison of multivariate and univariate time series approaches to modelling and forecasting emergency department demand in Western Australia. J Biomed Inform. 2015;57:62–73. pmid:26151668
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Choudhury A, Urena E. Forecasting hourly emergency department arrival using time series analysis. British Journal of Healthcare Management. 2020;26(1):34–43.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref6] 6. Hertzum M. Forecasting Hourly Patient Visits in the Emergency Department to Counteract Crowding. Open Ergonomics J. 2017;10(1):1–13.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref7] 7. Cheng Q, Argon NT, Evans CS, Liu Y, Platts-Mills TF, Ziya S. Forecasting emergency department hourly occupancy using time series analysis. Am J Emerg Med. 2021;48:177–82. pmid:33964692
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Kam HJ, Sung JO, Park RW. Prediction of Daily Patient Numbers for a Regional Emergency Medical Center using Time Series Analysis. Healthc Inform Res. 2010;16(3):158–65. pmid:21818435
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref9] 9. Sun Y, Heng BH, Seow YT, Seow E. Forecasting daily attendances at an emergency department to aid resource planning. BMC Emerg Med. 2009;9:1. pmid:19178716
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref10] 10. Marcilio I, Hajat S, Gouveia N. Forecasting daily emergency department visits using calendar variables and ambient temperature readings. Acad Emerg Med. 2013;20(8):769–77. pmid:24033619
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Rocha CN, Rodrigues F. Forecasting emergency department admissions. J Intell Inf Syst. 2021;56(3):509–28.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref12] 12. Xu Q, Tsui K, Jiang W, Guo H. A Hybrid Approach for Forecasting Patient Visits in Emergency Department. Quality & Reliability Eng. 2016;32(8):2751–9.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref13] 13. Yucesan M, Gul M, Celik E. A multi-method patient arrival forecasting outline for hospital emergency departments. International Journal of Healthcare Management. 2018;13(sup1):283–95.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref14] 14. Álvarez-Chaves H, Muñoz P, R-Moreno MD. Machine learning methods for predicting the admissions and hospitalisations in the emergency department of a civil and military hospital. J Intell Inf Syst. 2023;61(3):881–900.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref15] 15. Hu Y, Cato KD, Chan CW, Dong J, Gavin N, Rossetti SC, et al. Use of Real-Time Information to Predict Future Arrivals in the Emergency Department. Ann Emerg Med. 2023;81(6):728–37. pmid:36669911
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref16] 16. Jilani T, Housley G, Figueredo G, Tang P-S, Hatton J, Shaw D. Short and Long term predictions of Hospital emergency department attendances. Int J Med Inform. 2019;129:167–74. pmid:31445251
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Juang W-C, Huang S-J, Huang F-D, Cheng P-W, Wann S-R. Application of time series analysis in modelling and forecasting emergency department visits in a medical centre in Southern Taiwan. BMJ Open. 2017;7(11):e018628. pmid:29196487
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Calegari R, Fogliatto FS, Lucini FR, Neyeloff J, Kuchenbecker RS, Schaan BD. Forecasting Daily Volume and Acuity of Patients in the Emergency Department. Comput Math Methods Med. 2016;2016:3863268. pmid:27725842
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Sharafat AR, Bayati M. PatientFlowNet: A Deep Learning Approach to Patient Flow Prediction in Emergency Departments. IEEE Access. 2021;9:45552–61.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref20] 20. McCarthy ML, Zeger SL, Ding R, Aronsky D, Hoot NR, Kelen GD. The challenge of predicting demand for emergency department services. Acad Emerg Med. 2008;15(4):337–46. pmid:18370987
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref21] 21. Wargon M, Casalino E, Guidet B. From model to forecasting: a multicenter study in emergency departments. Acad Emerg Med. 2010;17(9):970–8. pmid:20836778
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref22] 22. Erkamp NS, van Dalen DH, de Vries E. Predicting emergency department visits in a large teaching hospital. Int J Emerg Med. 2021;14(1):34. pmid:34118866
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref23] 23. Graham B, Bond R, Quinn M, Mulvenna M. Using Data Mining to Predict Hospital Admissions From the Emergency Department. IEEE Access. 2018;6:10458–69.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref24] 24. Ekström A, Kurland L, Farrokhnia N, Castrén M, Nordberg M. Forecasting emergency department visits using internet data. Ann Emerg Med. 2015;65(4):436-442.e1. pmid:25487026
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref25] 25. Ghada W, Estrella N, Pfoerringer D, Kanz K-G, Bogner-Flatz V, Ankerst DP, et al. Effects of weather, air pollution and Oktoberfest on ambulance-transported emergency department admissions in Munich, Germany. Sci Total Environ. 2021;755(Pt 2):143772. pmid:33229084
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref26] 26. CEIm-IB document for ethical approval. https://csv.caib.es/concsvfront/view.xhtml?hash=f5ab8ef62789149a8df0adc5574a934e67383035268332413314cdeae817b67b

[ref27] 27. Crisafulli P. ComplexParide/son_espases: Predicting the number of incoming patients in the ED of Son Espases University Hospital (HUSE). Zenodo. 2026.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref28] 28. Workalendar: Worldwide holidays and workdays computational toolkit. https://workalendar.github.io/workalendar/

[ref29] 29. Instituto Nacional de Estadística. Instituto Nacional de Estadística. https://www.ine.es

[ref30] 30. Agencia Estatal de Meteorología (AEMET). https://opendata.aemet.es

[ref31] 31. Guo D. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv. 2025.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref32] 32. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, 2019. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref33] 33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(85):2825–30.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref34] 34. Wilson GT. Time Series Analysis: Forecasting and Control, 5th Edition, by Box GEP, Jenkins GM, Reinsel GC, Ljung GM. John Wiley and Sons Inc.: Hoboken, New Jersey. 712. ISBN: 978‐1‐118‐67502‐1. Journal of Time Series Analysis. 2016 Mar;37(5):709–11. Available from: http://dx.doi.org/10.1111/jtsa.12194
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref35] 35. Seabold S, Perktold J. Statsmodels: Econometric and statistical modeling with python. In: 2010.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref36] 36. Diebold FX, Mariano RS. Comparing Predictive Accuracy. Journal of Business & Economic Statistics. 1995;13(3):253–63.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref37] 37. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità. Firenze: Seeber. 1936.

[ref38] 38. Los turistas llenan Urgencias al dejar de ser derivados a las clínicas privadas. 2023. https://www.ultimahora.es/noticias/local/2023/07/04/1969045/sanidad-baleares-turistas-llenan-urgencias.html

[ref39] 39. Colapso en las urgencias de Magaluf por turistas borrachos. 2019.

Figures

Abstract

Introduction

Materials and methods

Data preprocessing and behavior

Age cohorts and hospitalization risk

Input variables and models

Performance metrics and bootstrapping

Results

Non-time-series models

Comparison with time-series models

Testing the models on post-COVID data

Conclusions

Supporting information

S1 Text. Original data columns.

S2 Fig. NIP as a function of time, residents vs non-residents.

S3 Fig. NIP as a function of time, females vs males.

S4 Text. DeepSeek prompt to preprocess weather data.

S5 Fig. Hyperparameter tuning for RF and SVR models.

S6 Text. FNN detailed structure.

S7 Fig. Hyperparameter tuning for the FNN model.

S8 Table. Full set of SMAPEs, RMSEs, and MAEs.

S9 Table. Full set of DM p-values.

S10 Table. Full set of SMAPEs, RMSEs, and MAEs (post-COVID).

Acknowledgments

References