Trends, structural changes, and assessment of time series models for forecasting hospital discharge due to death at a Mexican tertiary care hospital

Background Data on hospital discharges can be used as a valuable instrument for hospital planning and management. The quantification of deaths can be considered a measure of the effectiveness of hospital intervention, and a high percentage of hospital discharges due to death can be associated with deficiencies in the quality of hospital care. Objective To determine the overall percentage of hospital discharges due to death in a Mexican tertiary care hospital from its opening, to describe the characteristics of the time series generated from the monthly percentage of hospital discharges due to death and to make and evaluate predictions. Methods This was a retrospective study involving the medical records of 81,083 patients who were discharged from a tertiary care hospital from April 2007 to December 2019 (first 153 months of operation). The records of the first 129 months (April 2007 to December 2017) were used for the analysis and construction of the models (training dataset). In addition, the records of the last 24 months (January 2018 to December 2019) were used to evaluate the predictions made (test dataset). Structural change was identified (Chow test), ARIMA models were adjusted, predictions were estimated with and without considering the structural change, and predictions were evaluated using error indices (MAE, RMSE, MAPE, and MASE). Results The total percentage of discharges due to death was 3.41%. A structural change was observed in the time series (March 2009, p>0.001), and ARIMA(0,0,0)(1,1,2)12 with drift models were adjusted with and without consideration of the structural change. The error metrics favored the model that did not consider the structural change (MAE = 0.63, RMSE = 0.81, MAPE = 25.89%, and MASE = 0.65). Conclusion Our study suggests that the ARIMA models are an adequate tool for future monitoring of the monthly percentage of hospital discharges due to death, allowing us to detect observations that depart from the described trend and identify future structural changes.

Introduction Data on hospital discharges due to death can be used as a valuable instrument for hospital planning and management [1]. In Mexico, the General Directorate of Health Information (Dirección General de Información en Salud-DGIS; Spanish acronym) is the operational body of the Ministry of Health (Secretaría de Salud-SSA; Spanish acronym) that is responsible for generating statistics on health and has various information subsystems, including the Automated Hospital Discharge System (Sistema Automatizado de Egresos Hospitalarios-SAEH; Spanish acronym) [2]. The number of national hospital discharges recorded by Mexican SSA hospitals as primary sources in the SAEH is estimated to be 3 million cases per year, and of these, approximately 2% are deaths [3]. There are descriptive reports on national and regional hospital discharges [2,4], but there are few inferential studies on trends, the identification of associated variables, or predictions about such hospital discharges. The Bajío Regional Hospital of High Specialty (Hospital Regional de Alta Especialidad del Bajío-HRAEB) has provided clinical, diagnostic, and tertiary treatment services since April 2007. It has 184 census beds, does not have an emergency department, and discharges patients with complex pathologies. Its source documents for recording hospital discharges are the clinical file and death certificate. The corresponding data from on these documents are captured in the computer platform of the SAEH by the Medical Statistics area personnel of the HRAEB according to the guidelines of the DGIS [5].
Since 1986, the Health Care Financing Administration has incorporated the hospital mortality rate as a qualitative comparator of American hospitals, which has encouraged the use of hospital care outcome indicators across the world [6]. Hospital mortality is one of the most frequently used indicators of quality of care [7], and the quantification of deaths can be considered a measure of the effectiveness of hospital intervention, although it should not be forgotten that this indicator is influenced by other factors, such as the pathology being treated, population structure and accessibility to the hospital unit [8]. A high percentage of hospital discharges due to death can be associated with deficiencies in the quality of hospital care [9].
Since in this case the percentage of discharges due to death it is a change of percentage over time, it can be defined as a time series. To model this mortality series, linear models are typically constructed [10]. Some studies have even analyzed the trends of hospital discharges for a specific cause [11], and other authors have estimated predictions [12]. Usually Autoregressive integrated moving average (ARIMA) models are used for analyzing and forecasting time series data and are also capable of modelling a wide range of seasonal data [13][14][15]. The Box-Jenkins (ARIMA) econometric modelling is a forecasting technique that completely ignores independent variables in making forecast. It takes into account historical data and decomposes it into Autoregressive (AR) process, where there is a memory of past events; an Integrated (I) process, which accounts for stabilizing or making the data stationary, making it easier to forecast; and a Moving Average (MA) of the forecast errors, such that the longer the historical data, the more accurate the forecasts will be, as it learns over time [16,17].
For the identification of the ARIMA model, two goodness-of-fit statistics that are most commonly used for the model selection are; Akaike Information Criterion (AIC) and Schwarz Bayesian Information Criterion (BIC). The AIC and BIC are determined based on a likelihood function [15]. When comparing two models, the one with the lower AIC or BIC is generally better [18]. Low AIC or BIC values suggest that a model nicely straddle the requirements of goodness-of-fit and parsimony [19].
In addition, regarding hospital deaths, there is scientific evidence that certain events, including enabling hospital admissions on weekends [20,21], varying the number and educational level of nursing staff as well as the ratio of nurses per census bed [22][23][24][25], increasing the number of hospitalized patients [26,27] and increasing the volume of surgical patients [28,29], can increase the risk of hospital mortality and therefore raise the percentage of hospital discharges due to death.
The aim of this study was to determine the overall percentage of hospital discharges due to death in a Mexican tertiary care hospital from its opening in April 2007 to December 2019, to describe the characteristics of the time series generated from the monthly percentage of hospital discharges due to death from April 2007 to December 2017 (first 129 months of operation) and to make and evaluate predictions of the monthly percentage of hospital discharges due to death from January 2018 to December 2019 (24 months later).

Patients
This retrospective study of Hispanic-Mexican patients included all the records of patients (n = 81,083) who were discharged from a tertiary care hospital (HRAEB) located in the city of León in the state of Guanajuato, Mexico, from its opening in April 2007 to December 2019 (153 months of operation). The dataset is secondary and was obtained from the national government records of the SAEH subsystem operated by DGIS [3]. This is a subsystem that compiles national hospital discharge information from Mexican hospitals as primary sources. The dataset was divided into two parts. The records of the first 129 months (April 2007 to December 2017) were used for the analysis and construction of the models (training dataset). In addition, the records of the last 24 months (January 2018 to December 2019) were used to evaluate the predictions made (test dataset).

Ethical considerations
The protocol of this study was reviewed and approved by the Institutional Research and Ethics Committees of the HRAEB and the National Institute of Public Health (Instituto Nacional de Salud Pública) of Mexico (approval numbers: CI/HRAEB/2019/046 and PT 211, respectively).

Statistical analysis
All data were analyzed with the statistical software R [30]. Initially, a descriptive analysis was implemented, and the time series associated with the monthly percentage of hospital discharges due to death was plotted. For the time series, the trend was modeled using a simple linear regression model using least squares (SLRMLS) without segmentation [10]. The assumptions of normality, homoscedasticity, and independence of the residuals were verified using the Kolmogorov-Smirnov, White, and Durbin-Watson tests, respectively. Next, the instances where possible structural changes could occur (sudden changes in the trend of the time series) were identified using the Chow test [31] with the statistical R package "strucchange", which is designed to test structural changes in linear regression models [32]. Next, additive decomposition of the series into its components was performed as follows: trend, seasonal variations (seasonality), and irregular fluctuations (random) [10]. ARIMA models were used. The name of this procedure comes from its three components: autoregression (AR), integration (I), and moving average (MA). The ARIMA(p, d, q) model is fitted to the observed data, where p, d and q represent the order of the three respective components. Regarding the integrated components of the ARIMA model, if the stochastic process associated with a time series has a unit root, we can conclude that it is a nonstationary time series. Therefore, the Dickey-Fuller test was implemented for each of the y t series to determine if it was stationary. If not, we proceeded to differentiate each series (generating a new series z t = y t −y t−1 with t = 2,3. . .,n) as many times as necessary (d) to obtain a stationary time series and thus conclude that the original series y t was integrated of order (d). In addition, autocorrelation (ACF) and partial autocorrelation (PACF) plots were constructed for each series and for those series that resulted from differentiation [33] to empirically confirm whether the series could be considered stationary and identify the possible number of lags for each of the remaining components of the ARIMA model.
In order to be able to predict the monthly percentage of discharges due to death, with and without considering the structural change identified in the series, for the forecasting, the seasonal ARIMA models were adjusted. The seasonal ARIMA(p, d, q)(P, D, Q) s process is given by and Θ(z) are polynomials of orders P and Q respectively, each containing no roots inside the unit circle. If c6 ¼0, there is an implied polynomial of order d + D in the forecast function [34]. The selection of the proper seasonal ARIMA model in each case was done by the auto.arima function from the R package "forecast" [35]. This function returns the best ARIMA model according to the AIC or BIC value [36]. The function conducts a search over a range of possible models within the order constraints and picks the most suitable one [37]. The auto.arima function is based mainly on the Hyndman-Khandakar algorithm, which combines unit element tests, minimization of corrected Akaike tests (AICc), and the maximum likelihood method to obtain the model most appropriate for the data. The criteria of goodness-of-fit, based on the information criterion, were taken into account [38]: AIC = −2ln(L)+2(p+q+P+Q+k) and BIC = −2ln(L)+(p+q+P+Q+k)ln(n) where k = 1 if c6 ¼0 and 0 otherwise, and L is the maximized likelihood of the model fitted to the differenced data (1−B s ) D (1−B) d y t . After selecting the models for each series, it was verified that the residuals did not have a dependence structure and that they followed a white noise process.
Finally, predictions were made (point forecasts and prediction intervals) of the series of the percentage of discharges due to death, with and without considering the structural change identified in the series using the function forecast from the R package "forecast" [35]. These predictions were evaluated using four error indices: two scale-dependent error indices: mean absolute error (MAE) and root mean square error (RMSE); one percentage error index: mean absolute percentage error (MAPE); and one scale-free error index: mean absolute scaled error (MASE) [39][40][41][42]. In all tests, the significance level of α = 0.05 was used.

Results
A total of 81,083 hospital discharges, of which 2,767 were deaths, were generated in the HRAEB from April 2007 to December 2019 and included in the final analysis of this study. Table 1 describes the total hospital discharges of the HRAEB per year, and a total percentage of discharges due to death of 3.41% was observed. In 2009, the highest annual percentage of hospital discharges due to death, 5.77%, was recorded. In contrast, 2018 had the lowest annual percentage of hospital discharges due to death, 2.50%. . This model describes a trend with an apparent decrease in the percentage of hospital discharges due to death over time, but when verifying the assumptions of the model, it was observed that the residuals did not meet the assumption of independence (p < 0.001). This was done through the implementation of

PLOS ONE
Time-series models for forecasting hospital discharge due to death at a tertiary care hospital Fig 3 shows the exploratory graph of the additive decomposition of the series of the monthly percentage of HRAEB hospital discharges due to death as a trend, from seasonal variations (seasonality), and from irregular fluctuations (random). An apparent increasing trend was observed from the beginning until mid-2009, and later, there was an apparent decreasing trend. There was also an apparent seasonal variation with period s = 12.
In trying to verify whether the series of the monthly percentage of hospital discharges due to death was stationary, the presence of a unit root could be identified (Dickey-Fuller = -3.4712, lag order = 5, p = 0.48), from which we could presume that this was a nonstationary time series. When differentiating the series, we concluded that the series y t was integrated of order D = 1, since it was necessary to differentiate the series once to obtain the series (y t -y t-1 ) that was stationary (Dickey-Fuller = -6.7907, lag order = 5, p <0.01). Fig 4A shows the monthly series of hospital discharges due to death at the HRAEB (y t ). Fig  4B and Fig 4C show the autocorrelation (ACF) and partial autocorrelation (PACF) functions of the time series, respectively. The ACF function of the values of the time series was cut with extreme slowness, which reaffirmed that the series was nonstationary. Similarly, Fig 5A shows the differentiated series (y t -y t-1 ), and Fig 5B and 5C show the ACF and PACF of the differentiated series, respectively. They were cut after the second phase, which reaffirmed that the differentiated series was stationary. In addition, the analysis of both figures empirically suggested that p = 1 or 2 and q = 1 or 2 may be relevant orders of the AR and MA components of an ARIMA model for the series studied. After confirming that the series of the monthly percentage of discharges due to death at the HRAEB could be modeled with a seasonal ARIMA model with integrated order D = 1, the AIC and BIC were calculated in models I (1), AR (0), AR (1), AR (2), MA (0), MA (1), and MA (2). The number of lags that minimized them occurred in the ARIMA(0,0,0)(1,1,2) 12 model, that is, when P = 1,Q = 2.
Based on the above, a seasonal ARIMA(0,0,0)(1,1,2) 12 with drift was adjusted to model the series of the monthly percentage of hospital discharges from HRAEB due to death without considering structural change. The coefficient associated with the AR component (1) Fig 6B shows the predictions for the same period with prediction intervals of 90% that resulted with an ARIMA(0,0,0)(1,1,2) 12 model with drift but considering the structural change identified at t = 24. When comparing both predictions, the 90% prediction intervals that considered the structural change in the trend were narrower. Table 2 shows the predictions for the monthly percentage of hospital discharges due to death from January 2018 to December 2019 considering the two models (no structural change and considering the structural change in the trend). The model that did not consider structural change estimated an average point prediction of the overall percentage of hospital discharges due to death of 2.62 ± 0.39% (range 1.73-3.32%) for the period from January 2018 to

PLOS ONE
Time-series models for forecasting hospital discharge due to death at a tertiary care hospital December 2019, a percentage higher than that estimated with the prediction average point generated by the model that incorporated the identified structural change 2.12 ± 0.36% (range 1.41-2.62%) for the same period. More accurate estimates (narrower prediction intervals) were obtained when using the model that incorporated the identified structural change, but compared with the actual values observed for the monthly percentage of hospital discharges due to death of 2.66 ± 0.71% (range 1.35-3.79%), very rough point estimates were observed in both cases. However, they were underestimated by the model that considered structural change.
In comparing the performance of the forecasts made for the monthly percentage of hospital discharges due to death using an ARIMA(0,0,01)(1,1,2) 12 with drift model that did not consider a structural change and an ARIMA(0,0,0)(1,1,2) 12 with drift model that did consider the structural change, the metrics listed in Table 3 were obtained, and all the calculated error indices favored the ARIMA model that did not consider the structural change, namely, MAE = 0.63, RMSE = 0.81, MAPE = 25.89%, and MASE = 0.65.

Discussion
In the study period (April 2007-December 2019), 3.41% (2,767/81,083) of hospital discharges from the HRAEB were due to death, which annually ranged from 2.50% to 5.77%, similar to the 4.10% identified by García. et al. [7] in a Spanish specialty hospital based on 24,194 episodes of hospitalization. These annual percentages are greater than the estimated 2% of hospital discharges per year due to death considering all Mexican SSA hospitals [3] and are also higher than the Mexican rate published in the journal Salud Pública de México in 1999 [2], which reported that 2.62% (35,891/1,469,161) of hospital discharges were due to death each year at the national level and 1.93% at the state level (Guanajuato, where the HRAEB is located) for 1999. There is little variability in the annual percentage of hospital discharges due to death at either the national or state level. The percentage of hospital discharges due to death at the HRAEB was higher than the national level and the state level (Guanajuato). This can be attributed to the fact that hospital discharges from the HRAEB correspond to patients with complex pathologies that require tertiary care, as suggested by Jiménez [8] in their study on the quantification of the quality of hospital services.
Regarding the monthly time series of the percentage of hospital discharges due to death in the HRAEB, a sudden change (structural change) in the trend of the time series was identified; this statistically significant change (p < 0.001) was detected at t = 24 (March 2009), which suggests that the trend of the series can be modeled by two linear segments: one increasing from April 2007 to March 2009 and another decreasing from April 2009 to December 2017. The factors that could influence the increasing trend and subsequent decreasing trend in the percentage of HRAEB hospital discharges due to death are as follows: First, the gradual hiring of specialized nursing staff in the HRAEB. Each year, the number of nurses (certified and specialized) has increased, since, as described by Aiken et al. [22], the number of nursing personnel as well as their level of professionalization affect hospital mortality. Leibson et al. [25] reaffirm the above facts and describe that a higher ratio of nurses per census bed decreases the mortality rate. Second, the constant increase in the volume of hospitalized patients in the first two years of HRAEB operation was associated with the habilitation and gradual certification of HRAEB services, consistent with the study by Taylor et al. [26], which described the association between the increase in the volume of patients and the increase in hospital mortality. Third, with the increase in the volume of surgical patients, approximately 80% of patients discharged from the HRAEB had undergone surgical procedures, which increases the likelihood of death from complex procedures, as detailed by Mcphee et al. [28], who identified the increase in the volume of surgeries as the main factor affecting mortality in one hospital. The monthly predictions of the percentage of hospital discharges due to death from January 2018 to December 2019 were estimated by fitting seasonal ARIMA models, similar to the method of Fang et al. [12], but with and without consideration of the structural change identified by Chow's test [31]. In comparing the performance of the predictions of both models with the actual percentages of hospital discharges due to death in that period (test dataset), the four calculated error metrics (MAE, RMSE, MAPE, MASE) favored the predictions made with the model that did not consider structural change. In our models, the estimated MAPE errors of approximately 25-30% may seem high but are because the values of the percentage of discharges for death were small (0-10%), which indicates that MAPE is not a good criterion for choosing the best model in this case, as described by Hyndman and Koehler [40]. The MAE and RMSE error indices are frequently used to determine which model generates the best predictions, perhaps due to their ease of calculation and interpretation, as observed in the study by Liu et al. [43]. However, a study by Hyndman [39] suggests that MASE is the best error metric to evaluate predictions in time series, such as ours, detailing that MASE values less than 1 indicate more accurate predictions. In our study, the two models constructed to predict the percentage of discharges due to death generated MASE values less than 1, but the model that did not consider structural change had a lower MASE value, suggesting that it is a more accurate model.
In future work, our data will allow us to evaluate various types of models (cubic, ARIMA, joint point), as Fang et al. did [12], and it will be necessary to identify the services that have the greatest influence on hospital mortality. It is necessary to study the percentage of hospital discharges due to death, conditioned on medical services in the HRAEB, and perform an analysis similar to that of Andrews et al. [9] that conditions mortality by specific diagnoses and procedures. In addition, the trends in mortality related to the specializations offered at the HRAEB could be specified, similar to what was described by Gonzaga et al. [44] in patients with breast cancer, Segura et al. [11] in patients with tuberculosis, and Fang et al. [12] in patients with liver cancer, to name a few.
Finally, the study has several limitations. First, since this was a retrospective study, causality could not be inferred. Second, as a secondary database was used, there could be minor capture errors inherent to the source itself. Third, since it was a single-center study in a tertiary care hospital, the results cannot be generalized to hospitals of other levels of care, although the analysis method described for analyzing the time series of mortality could be implemented to identify structural changes in hospitals of different levels of care.

Conclusions
This study expands the knowledge on hospital mortality, showing that the overall percentage of hospital discharges due to death in a Mexican tertiary care hospital is 3.41%. Through the detection of a structural change (in March 2009) in the series of the monthly percentage of hospital discharges due to death, we identified that hospital mortality at the HRAEB can be described by a growing trend from its opening (April 2007) to March 2009 and by a decreasing trend from April 2009 to December 2019, in addition to seasonal behavior. Our study suggests that the seasonal ARIMA models constructed and evaluated with their respective prediction intervals are an adequate tool for future monitoring of the monthly percentage of hospital discharges due to death, allowing us to detect observations that depart from the described trend and identify future structural changes.
Despite this was a retrospective single-center study based on a secondary database, all this information generated and its methodology can be used to improve decision-making and resource management to reduce the rate of deaths in both HRAEBs and other health institutions. Further research efforts could be also devoted to identifying the medical services that have the greatest influence on hospital mortality via various types of mathematical models.