An operationally implementable model for predicting the effects of an infectious disease on a comprehensive regional healthcare system

An operationally implementable predictive model has been developed to forecast the number of COVID-19 infections in the patient population, hospital floor and ICU censuses, ventilator and related supply chain demand. The model is intended for clinical, operational, financial and supply chain leaders and executives of a comprehensive healthcare system responsible for making decisions that depend on epidemiological contingencies. This paper describes the model that was implemented at NorthShore University HealthSystem and is applicable to any communicable disease whose risk of reinfection for the duration of the pandemic is negligible.


Introduction
Upon its emergence in 2020, the COVID-19 pandemic presented immediate challenges to the operation of NorthShore University HealthSystem (NS), a comprehensive regional healthcare system located in the northern part of Chicago, Illinois, and its suburbs. The need to forecast the expected demand on floor and ICU beds, ventilators and requisite supplies became pressing at the onset of the disease. The lack of reliable population data posed additional difficulty in implementing a usable model. Additional constraints of robustness, distributability and transparency imposed further requirements on the choice of the governing equations, solution algorithm and software implementation. During the initial stage of the pandemic, the model was delivered to the operational stakeholders daily; as time progressed, the frequency of dissemination was changed to once or twice a week, depending on the severity of the situation.
At the onset of the pandemic, the Clinical Analytics team was tasked with providing a reliable, scalable solution relevant to the local epidemiological situation [1]. While abundant literature exists on the theoretical aspects of the problem [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] concise but comprehensive description of challenges facing a researcher attempting to develop a workable model see, e.g., [22]. While most of the existing publications that offer applicable practical solutions focus on country-wide statistics [23,24], those dealing with local conditions are scarce. In order to find a satisfactory answer to this challenge, we had to quickly construct a flexible, scalable model easily adaptable to rapidly changing conditions that could be quickly communicated to a growing number of stakeholders while affording them an opportunity to create area-specific "back-of-an-envelope" analyses suitable for their needs. This task was accomplished by augmenting the industry-standard Susceptible-Infected-Recovered (SIR) model [2] with bootstrapping estimates and interpolation and extrapolation approximations of patient flow dynamics. The resulting model was robust enough to exhibit accuracy sufficient for predicting floor, general intensive care unit (ICU), ventilator census and mortality up to two weeks in advance. The main accomplishments of the foregoing approach were the ability to quickly adapt the model to the observed coefficient of transmission (R 0 ) prevalent in the hospital service area, compute the forward expected length of stay on the hospital floor, in the ICU and on the ventilator, and incorporate actual and projected vaccination rates into the model. The paper is organized as follows. First, we review generally accepted modeling principles for forecasting the progression of the disease (COVID- 19). Next, we provide empirical formulas for approximating dynamically observed rates of hospitalizations, ICU and ventilator placement, mortality and vaccination. Following that, we present the results and discuss their accuracy. We conclude with a summary of findings and directions for further research.

General equations
The choice of a model was informed by the requirements of specificity to the available NS population data, ease of implementation through a common tool understood by operational leaders and ease of explanation to a non-mathematical audience. In general, available models meeting those criteria include variants of the SIR model [2,3] with a time-dependent coefficient of transmission [4] and vaccination effects [5]. While a stochastic SIR model [6] presents a viable enhancement, estimating the parameters of the stochastic component may prove problematic during the onset of the pandemic. A plausible alternative, an Individual-Based Model (IBM) [7], requires substantially more effort devoted to implementation and analysis [8], and is more difficult to explain to the target audience than SIR.
From the practical standpoint, the need to develop a workable model prior to the publication of [4], as well as the need to have a robust, distributable software solution, necessitated the adoption of a simplified time-dependent form (note that the time dependency of β precludes the use of an analytical solution described in [9]).
where S(t)-fraction of the population susceptible to the disease, I(t)-fraction of the population currently infected with the disease, β(t)-coefficient of transmission, V(t)-fraction of the population that has been vaccinated, γ-fraction of the infected population removed from further consideration due to (permanent) recovery or death.
We are not concerned with the dynamics of the recovered population, and hence leave the "R" term out of Eqs (1) and (2). Since the dynamics for β(t) and V(t) are not known beforehand, they are not included in the differential equations as separate terms, and are instead left to be determined at a later discretization stage. For simplicity, we disregarded mobility considerations reviewed in [25,26], as well as implemented barriers to transmission (masking, social distancing and quarantine measures) [27].

Numerical solution
Data extrapolation and scenario analysis. At the onset of the pandemic, there is no reliable way to determine the true number of infected patients and hence the transmission coefficient β(t). While initial attempts were made to infer likely epidemiological dynamic from countries where the initial stage had by that time already passed [10], the validity of this approach was questionable even at that time since different locales exhibited different curve characteristics. In view of this, the approach adopted for the purpose of constructing a robust model applicable to local conditions was as follows: 1. assume that the number of observed NS cases reflected the actual count of the disease in the population. While this was certainly not the case initially, the accuracy of that number increase over time as testing became more prevalent and comprehensive; moreover, it is fair to assume that those inaccurate numbers reflected the qualitative dynamic of the pandemic; 2. extrapolate the evolution of β(t) implied by the historical data (initially, piecewise-constant; subsequently, polynomial or 7-day moving average); 3. repeat 1-2 for the Chicago / Cook / Lake county (CCL) area containing the majority of the NS catchment area; 4. construct a dynamic (time-dependent) ratio of NS to CCL cases and assume that it accurately reflects the proportion of CCL patients attributable to NS; (1) and (2) separately for CCL and NS;

solve Eqs
6. use the minimum and maximum case number estimates from step 5 as boundaries for the expected number of NS cases.
Specifically, for step 2, we need to find the value of β(t) that delivers an exact solution to (1) and (2) at t (more on this in subsection below). We can do this by equating the number of newly discovered cases in the NS population less the number of those newly vaccinated to the instantaneous decline in the susceptible population (since the latter is monotonically decreasing, i.e., dS(t) < 0, −dS(t) > 0 represents the number of patients who have been infected or vaccinated at time t): Eq (3) applies to the CCL population as well.

Numerical solution of the SIR equations
Conventionally, (1 and 2) are solved numerically using the 4-th order explicit Runge-Kutta method [28]: The Runge-Kutta method (4-13) is explicit and therefore inherently unstable, however, it is conventionally applied for h = 1. The justification of this can be found, e.g., in [29].

Estimating the number of potential NS patients
Eqs (1) and (2) are written in terms of population percentages, i.e.,

SðtÞ ¼ SðtÞ
IðtÞ ¼ IðtÞ VðtÞ ¼ VðtÞ where SðtÞ-NS population susceptible to the disease, IðtÞ-NS population currently infected with the disease, VðtÞ-NS population that has been vaccinated, N(t)-NS population at time t. In order to estimate N(t), we assumed that the current proportion of NS cases relative to the observed CCL cases is indicative of the fraction of the CCL population that potential NS patients represent. In other words, where DI þNS ðtÞ-newly discovered NS cases at time t, DI þCCL ðtÞ-newly discovered CCL cases at time t, N CCL -CCL population (deemed constant). The left hand side of (17) is time-dependent. This seemingly contradicts the static assumption for N implied by the form of (1 and 2). One could, at least partially, refute this objection by pointing out that N is an estimate at time t of the true NS population. The exact number of potential NS patients in the NS catchment area is unknown since potential future NS patients may have had no prior contact with NS facilities and, conversely, those who have sought treatment at NS in the past may chose an alternative provide for their emergency care and subsequent recovery. It is presumed to approach the exact (steady-state) value asymptotically. The rationale for that is mostly empirical, however, one could hypothesize that, as the epidemic progresses, patient flow distributions across healthcare providers tend to stabilize. In fact, as can be inferred from Fig 1, empirical data derived from positive test results and patient censuses in various parts of the hospital suggests exponential decay of N N CCL that can be approximated by where α > 0, β > 0, μ < 0-empirically determined constants: α = 0.452, β = 1.61, μ = −0.012 (the presented empirical fit was performed on the data between Oct. 31, 2020 and May 30, 2021). U.S. Census Bureau [30] estimates the population of the CCL area to be 5,846,768 residents as of July 2019 (the latest data available at the time of writing). From (18), we can obtain the population estimate for NS to be approximately 392,000.

Projecting the number of hospitalizations, ICU and vent placements and deaths
In order to forecast the number of patients requiring general beds, ICU placement or intubation, we assume that, at any given time, a patient can be observed in any of the following states: • on the floor but not in the ICU (and not intubated; lower acuity); • in the ICU but not intubated (elevated acuity), • intubated (highest acuity).
The flowchart in Fig 2 represents the progression of a hospitalized patient through his or her stay in the hospital. The following simplifying assumptions have been made: • floor (lower acuity), ICU and intubated patients are accounted for separately, i.e., those are mutually exclusive groups; • a patient is initially placed on the floor. If their condition is grave, transfer to ICU and / or intubation occurs (almost) instantly; • upon deterioration, a patient proceeds from the floor to the ICU to intubation. No stages in this sequence are skipped, but a patient can spend almost no time in any state and be transferred to a higher acuity stage instantly (in other words, if a severely ill patient expires without being transferred to the ICU and / or being intubated, we consider that patient to have instantaneously transitioned through those two stages to mortality); • upon improvement, a patient proceeds from intubation to the ICU to the floor as applicable.
No stages in this sequence are skipped but a patient can spend almost no time and be transferred to a lower acuity stage instantly; • there is no formal restriction on how many times a patient can deteriorate or improve.
Under those assumptions, the state equations describing the population dynamics inside the hospital are , setting the length of the lookback period N B to be the same for all three groups of patients, N B � N F = N ICU = N vent reduces (19)(20)(21) to In other words, hospitalization, ICU and vents rates are computed exactly as rolling N-day averages up until the current time, and then extrapolated as averages over the same time period going forward. It does not appear possible to define those rates smoothly since the calculation of the rate itself depends on the predicted number of affected patients which, in turn, depends on the rate creating a "circular reference".
The numbers of hospitalizations, ICU and vent placements are specific to the population served by a healthcare system and can be extrapolated to other entities only with caution. At the beginning of the pandemic, state-wide and regional data was either not available or unreliable thus necessitating an approximation using NS census and deaths. In doing so, the number of patients entering the hospital floor, ICU units and being intubated was assumed to be proportional to the observed number of cases.
In order to predict the counts (censuses) of the patient population currently hospitalized, placed in the ICU and intubated, it is necessary to model the flow of patients through each of those units. This can be done by backing out ("bootstrapping") recovery rates from the observed population dynamics as follows:  DH À F ðtÞ-number of patients removed (due to discharge, placement in the ICU or death) from the floor at time t, DH À ICU ðtÞ-number of patients removed (due to return to the general floor population, intubation or death) from the ICU at time t, DH À vent ðtÞ-number of extubations (due to extubation or death) at time t, N F -length of lookback period for floor patients (at the time of this writing, 7 days), N ICU -length of lookback period for ICU patients (7 days), N vent -length of lookback period for intubated patients (14 days), μ F (t)-observed floor removal rate at time t, μ ICU (t)-observed ICU placement rate at time t, μ vent (t)-observed extubation rate at time t. Eqs (31)-(33) can be used to determine the values of μ F , μ ICU and μ vent for t � T. For t > T, moving average extrapolations are used (cf. (22)(23)(24)): Following the patient flow assumptions reflected in Fig 1, mortality rate is computed as where M(t)-mortality rate at time t, MðtÞ-cumulative number of deaths at time t, t � t N -current time.

Projecting vaccination rates
Vaccination rates in the CCL area at the time of this writing followed a quartic trajectory with remarkable accuracy, as shown in Fig 3. Empirically, the shape of the quartic parabola is determined using the usual least squares best fit to be V ð4Þ CCL ðtÞ ¼ À 5:667435 � 10 À 3 t 4 þ 7:590466 t 3 À 3:619223 � 10 3 t 2 þ7:302425 � 10 5 t À 5:232056 � 10 7 : The number of significant digits in 38 is extended for consistency.
Imposing the upper limit of 100% of the population and requiring that the number of vaccinated individuals be monotonically nondecreasing, we obtain (based on the official CCL vaccination data from Jan. 3, 2021 to May 30, 2021) The number of fully vaccinated NS patients is then obtained from (18) as

Worked example
In the example below, we assume that CCL population is 5,846,768, relevant NS patient population estimated by (18) is 392,000 and infection transmission period 1/γ = 15. We set the pandemic start date to March 10, 2020. The date for this example was arbitrarily chosen from past history with the requirements that the number of new cases, patient admissions and censuses on the floor, in the ICU and attached to a ventilator be reasonable large to avoid instability and that the trajectory of the pandemic be fairly well established.
NS case and admission data as of Feb. 26, 2021 is presented in Table 1.
The last row of the table is incomplete because, while reliable case data is generally available up to and including the next-but-last day, case and hospitalization data is relatively reliable for the preceding day.
Calculations for the Runge-Kutta implementation of the model for CCL as of Feb. 26, 2021 are presented in Table 2, and for NS they are displayed in Table 3.
With the line corresponding to the date of 2/24/2021 as an example, the algorithm proceeds as follows: 1. Using Excel Solver's unconstrained GRG (generalized reduced gradient) nonlinear optimization with centered difference approximation and automatic scaling with constraint precision = convergence tolerance = 10 −3 (or any other optimization routine), find the value of β on the preceding day that minimizes the square of the residual between the predicted and observed number of cases at time t: wherê I ðt 0 ; t i Þ-predicted cumulative number of identified positive cases from the beginning of the pandemic t 0 to time t i , Iðt 0 ; t i Þ-actual cumulative number of identified positive case from the beginning of the pandemic t 0 to time t i , N(t)-NS population at time t i (i.e., the asymptotic limit of (18). t i -current time at the i-th time step. The value of β at which this minimum is achieved corresponds to the preceding time step, 2/23/2021, and is equal to 0.043 (rRounded to two significant digits)
3. Compute the NS susceptible rate (cf. 12) The above yields S(t i ) = 0.8604 − 6.35 × 10 −4 × 1 = 0.8598.  6. Compute the infection rate     twice a week, on Mondays and Thursdays with a few exceptions around statutory holiday. Forecasts produced between the periods of calculation were classified as "one-day-ahead" (even though they may have been issued 1 to 6 days in advance); one-and two-week-ahead predictions were also considered. Two sets of predictions were issued on each occasion: one based on the NS data, the other extrapolated from the CCL data adjusting for the then-current share that NS population represented in the CCL pool according to (17 and 18). The synthesis of two disparate sources required a different metric than, e.g., weighted interval score, employed for this purpose in [31,32]. For practical purposes, we adopted a simplified approach described below (in general, our conclusions about the accuracy of the developed model, although arrived at through different means, are similar in nature to those reached in [32] with respect to the short-term forecasting model for Germany and Poland). The minima and maxima of projections thus generated were considered the lower and upper forecast boundaries. If subsequently realized values fell within those boundaries, the corresponding error was set to zero, otherwise it was taken to be the absolute relative error of the most accurate boundary (upper or lower).

Compute the NS vaccination rate from (39 and 40) and
The results are presented in Table 4. Evidently, the best predictions in terms for the number of positive cases, floor, ICU and vent censuses are achieved one day in advance, and the accuracy deteriorates with the increase in the time horizon. This was to be expected. The accuracy of mortality predictions is less dependent on the time horizon, and the relationship between the former and the latter is less pronounced. This could be explained by the relative stability of the number of mortality cases and the relatively static nature of (37).
Accuracy trajectories for the number of positive cases, inpatient, ICU and vent censuses are presented in Fig 4. It can be observed that the most accurate predictions to date have been made during periods of relative "calm", i.e., those times when the infection curve followed a declining or quasi-static pattern (approximately, May-September 2020). Periods of elevated error include "regime changes" at the end of May 2020 and September 2020 to mid-January 2021. This was also to be expected given the uncertainty not captured by the moving average or polynomial extrapolation of the future transmission coefficient, R 0 . Overall, the model appears to have "erred on the side of caution" overestimating the expected patient census while ICU and vent censuses tend to be underestimated more often, especially during the periods of "regime change".

Practical application
The model was distributed as an Excel spreadsheet to NS executive administrative team, physician, nursing and supply chain leadership. This form of distribution allowed those interested in scenario analysis and additional forecasting specific to their line of business the flexibility to perform additional calculations independently, without the need to engage the Clinical Analytics team. During the initial period from Mar. 26, 2020 to June 18, 2020, the forecast was sent out daily. From June 22, 2020 to May 31, 2021, the distribution frequency was set to twice a week. The recipients were advised to treat any outlook more than two weeks ahead of the distribution date with caution. With this caveat, the forecasts were used as a general guide for nearterm future allocation of staff, hospital beds, resources and personal protective equipment. While it is difficult to quantify the impact of the model on hospital operations, according to the feedback received form stakeholders, the forecasts provided enough lead time to serve an additional data point in making operational decisions during the most difficult times of COVID-19. According to the general feedback from forecast recipients, prediction accuracy, the level of provided detail and the timing of distribution were adequate for rendering the model useful for the purpose it was intended for.

Conclusion
The model provided acceptable predictive accuracy for the operational stakeholders to use it as an additional data point in their decision-making process. While specifically designed for COVID-19, the algorithm used for implementing the model and distributing the results could be replicated for a similar communicable disease with high transmission, significant hospitalization, near zero reinfection and moderate mortality rates, provided sufficient requisite data describing its evolution in the community is available.