Data-driven study of the COVID-19 pandemic via age-structured modelling and prediction of the health system failure in Brazil amid diverse intervention strategies

In this work we propose a data-driven age-structured census-based SIRD-like epidemiological model capable of forecasting the spread of COVID-19 in Brazil. We model the current scenario of closed schools and universities, social distancing of people above sixty years old and voluntary home quarantine to show that it is still not enough to protect the health system by explicitly computing the demand for hospital intensive care units. We also show that an urgent intense quarantine might be the only solution to avoid the collapse of the health system and, consequently, to minimize the quantity of deaths. On the other hand, we demonstrate that the relaxation of the already imposed control measures in the next days would be catastrophic.


Introduction
Coronavirus disease-2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (Sars-CoV-2) is a main threat for the public health systems throughout the globe [1][2][3][4][5][6][7][8][9][10][11][12][13]. As of April 03, 2020, the world had more than one million (1.016.401) confirmed cases, 53.160 deaths and 211.775 recovered persons [7] since the first suspected case of (COVID- 19) on December, 2019, in Wuhan, Hubei Province, China [8], with a frightening propagation speed. So far, 52 countries reported more than 1000 cases from all continents, Antarctica being the only exception with no cases. In Brazil, at the time of writing, the COVID-19 statistics was: 8.066 confirmed cases, 327 deaths and 127 recovered individuals [7]. Therefore, a critical aspect of the COVID-19 pandemic, rather than the percentage of infection and even the mortality rate, is the healthcare system capacity. In Brazil  If, on average, one infected person contaminates one or more individuals, the epidemic is sustainable, otherwise it dies out. This is the definition of the basic reproduction number (R 0 ), which represents the average number of secondary cases from a single infectious case in a totally susceptible population (in the very beginning of the infection the susceptible population corresponds approximately to the total population N, S � N), defined as R 0 = β/(α + γ). Therefore, it is very important to reduce transmission rate in order to reduce the subsequent reproduction ratio R n (not to mistaken with the number of recovered individuals) so that R n < 1 and epidemic fades out. In the unlikely worst case scenario in which the government does not impose any measures for containing the infectious disease, R will decrease below unit due to exhaustion of susceptible individuals, at the expense of a tremendous amount of infected and dead individuals, collapsing not only the health system but the economy as well. Let us define S(t), I(t), R(t), D(t), the number of susceptible, infected, recovered and dead individuals, respectively, at time t in a population of size N. The latest Census-based data for Brazil N = 211, 319, 631 [26]. In fact, the variables are fractions of the respective compartments, i. e., s 0 = S/N, i 0 = I/N, r 0 = R/N and d 0 = D/N. Our SIRD model is composed of 36 coupled ordinary differential equations since i belongs to one of the nine age groups as seen in The individuals can flow among four states in each of the age group i: susceptible (S i ) to infected (I i ) with rate β i , who can recover (R i ) at a rate α i or die (D i ) at a rate γ i , the recovered populations do not become susceptible, as is suggested for COVID-19 [12]. The NPIs policies serve as a barrier to contain the infection. The quantity (g i S i ) represents the fraction of people not respecting the NPI policy, therefore still susceptible.
https://doi.org/10.1371/journal.pone.0236310.g001 Table 1, yielding the kinematics of the variables, as follows: where dx/dt is the derivative of variable x with respect to time, hence a time variation. In this manner, the corresponding right hand side of each equation provides the variation amount of the variable in time t, increasing or decreasing if the amount if positive or negative, respectively. Here, S i represents the susceptible individuals for age group i, I i stands for infected individuals in age group i, R indicates the recovered individuals no more susceptible to infection, and D is the death total. The variable I ¼ P 9 i¼1 I i is the total number of infected people in a given time t. Here, β i , α i and γ i denote the apparent per day infection (as suggested by data), recovery and mortality rates for the age group i, respectively. Note that these parameters do not correspond to the actual per day infection, recovery and mortality rates as the new cases of recovered and deaths come from infected cases several days back in time. However, one can attempt to provide some coarse estimations of the apparent values of these epidemiological parameters based on the reported confirmed cases using an assumption and approach described in the next section, see Table 1. The parameter g i represents the governmental strategies and/or public policies for the age group i or a combination of them (see Table 2). In our SIRD model it is assumed that the total number of the population remains constant (dead also included), once the sum Eq 1 implies a null derivative. If the rate at which infected persons need intensive care units (ICU) is known, we can described it as H means the healthcare demand due to hospitalized cases requiring critical attention in ICU, see Table 3. This equation can be coupled to the previous system (1) to estimate the time evolution of the health system demand (H). It is worth mentioning that the critical cases require long hospitalization, meaning that we will neglect the beds vacancies generated by recoveries or deaths. Although the number of ICUs is massively concentrated in capitals and big urban centers [14], this in-homogeneity is not considered in our model, that should be seen as an average over all different regions in Brazil. The model can be improved to account for a contact matrix C ij given the probability of contact between the age groups, however it would be only a heuristic guess, so we decide not to include it here, see for instance [27,28]. Although the contact matrix is still relevant, the infectiousness of infected individuals seems to be almost the same for distinct symptomatic outcomes [29]. It is also worth mentioning that SIRD-like models can be comparable with more complex models such as random graphs networks [30][31][32][33][34].

Modelling non-pharmaceutical interventions (NPIs)
The parameter g i represents the NPI policies and, as in [9], they are not supposed to have full compliance of the individuals. Further, as a by-product such NPIs might generated other kinds of social contacts, for instance, those due to the essential services that continue running even in a mandatory quarantine. For combination of NPIs, one should take the lowest value in each corresponding row of Table 2. So, g i influences directly the spread of the disease, having strong effect on the efficacy of the infection process and can be interpreted as alterations of the Table 2

. Diverse non-pharmaceutical interventions modelled by the parameter g i ranging from the absence of any control measures (No NPI), closure of schools and universities (CSU), social distancing of people with more than 60 years old (SD60+), voluntary home quarantine (VHQ) and intense quarantine (IQ).
It reflects the fraction of individuals still susceptible after the measure is applied per age group. However, none of them are suppose to be fully respected and/or other contacts result as a by-product of the measure [9], except for the case of total closed schools and universities as is happening in Brazil at the time of writing, see more details in the main text. For combination of NPIs, one should take the lowest value in each corresponding row. β i parameter, resulting in an effective b eff i ¼ g i b i due to the imposed control measure. It comes from the reasonable assumption that in the early stage of the infection S � N, therefore it fights the infection by reducing the number of susceptible persons. In our approach it reflects the amount of susceptible individuals undergoing the specific control measure and g i S i represents the fraction of S i not complying with the policy g i . Ahead we discussed the NPIs considered in this work along with the expected response from the population to these measures. In all cases, the compliance is not 100% effective [9] since that, for instance, many essential services are needed, so even in an intense and mandatory quarantine we supposed the 25% of the susceptible persons are still well exposed to the infection (see the fraction 0.25 in the last column of Table 2).
In the lack of a vaccine, it is improbable that no NPI (g i = 1) policy would be applied. To estimate g i (reflecting the fraction of individuals still susceptible after the measure is applied) for the different NPIs and age groups we follow a approach as in Ref [9]. For the case of closure of school and universities (CSU), we use the census-based data of the number of students per age group to reduce the respective number of susceptible individuals. They are all supposed to be uninfected in the early stage of the epidemic and we assume that 100% of this target population will not disobey the policy as all schools and universities are closed in Brazil. The third column of Table 2 is the unit minus the values in the forth column of Table 1. For social distancing of people over 60 years old (SD60+) we assume that 75% will comply with this policy, meaning that 25% will leak the isolation, therefore the fraction 0.25 in the last three rows of Table 2. For voluntary home quarantine (VHQ) and social distancing of the entire population as an intense quarantine (IQ) the compliance is 50% and 75%, meaning that the remaining quantity of susceptible individuals in each age group is factor of 0.5 and 0.25 of their population, respectively. See Table 2 to check g i for each NPI. As the measures are not capable of producing effects instantaneously, taking in general t β = 14 days be effective [22], we have to apply a modulation in the g i parameter in a similar fashion as done in Ref. [22]. Thus, we model g i as where this modulation function is the sigmoid function, t NPI is the date at which the measure is implemented and t β is the number of days it takes to produce effects.

Initial conditions and model calibration
For the model to reflect well the scenarios for the next days according to the adopted public policies, one has to inform the proper initial conditions as well as the model parameters. The corresponding initial conditions (S i (0), I i (0), R i (0), D i (0)) were taken from the data available for the Brazilian case at the time of writing. Differently, from what has been done by many countries, the data from the national ministry of health is not informed in an age-structured manner. For that reason, we collected the data from epidemiological bulletins from the health secretaries from the states of São Paulo, Rio de Janeiro, Ceará, Minas Gerais, Pernambuco, Mato Grosso do Sul, Alagoas and the Distrito Federal, counting 730 infected cases as of 21th March 2020 (the initial time step in our simulations), corresponding to around 65% of the confirmed cases at that date. We assume that this distribution reflects well the overall national distribution. So the input for I i (0) is the multiplication of the infected distribution p[I i (0)] by the official number of infected persons at the March 21, 2020, that is, N infected = 1128. The death distribution is more accurate given that the national ministry of health provides this information directly, as informed in the third column of Table 4. Thus, the input D i (0) is also the multiplication of the death distribution p[D i (0)] by the total number of dead individuals at the time of writing (N dead = 92). The input S i (0) is just multiplication of corresponding age group population percentage (second column in Table 1) by the total population N, discounted the corresponding number of infected in the age group. Because at the time of writing the number of recoveries is negligible (just 6 recovered individuals), we set R i (0) = 0. We remark that this is a common assumption to model infections at early stages, as is the case in Brazil and in many other countries.
The calibration of the model requires robust data so that the model parameters can be as realistic as possible. In the absence of such that for the Brazilian case, we used some data reported in studies with the largest number of individuals with an age-distributed fatality rate (γ i ) [1] and the percentage of persons undergoing critical intense care (c i ) [2]. For the recovery rate (α i ) we use the assumption that α i = (1 − γ i ) � r, where r is the overall fraction of recovery in the closed cases, known so far to be r = 0.82, meaning that 82% of those who did not succumb to the disease are now healed [7]. This does not imply a overall death ratio of 18%, since it accounts only for closed cases, in which one can compute statistics. Although this fraction could change over time, the statistics is reliable since the number of total closed cases is N closed = 191, 623, roughly 25% of the confirmed cases at the time of writing. The parameter β i describes the efficacy of the infection process and can be measure indirectly. Our first assumption is that this efficacy depends weakly on the age group, therefore β i = β is a constant vector. The effective value of β can be computed as x stands for the mean value of the variable x and R 0 reproduction number already defined and calculated to be in the range 1.5-6.0 in many countries [9,11]. We will estimate the R 0 by performing a fit in our model with current data available. Given the current control measures in Brazil (a combination of CSU, HVQ and SD60+) our current g i is the concatenation of the corresponding columns in Table 2, taking the lowest values. This was in accordance with the overall estimates via Google's Community Mobility tool around April 1 for Brazil [38]. The data fitting shown in Fig 2 indicates that in Brazil the R 0 � 3 is still high nevertheless the current control measures. It means that one infected person, on average, is contaminating 3 other individuals. A clear indication that Brazil should not relax the NPIs adopted so far. In fact, as we will show in the next section, even stronger measures are going to be needed to avoid surpassing the capacity of the national health system and, consequently, minimize the number of severe cases and deaths. As we seek to estimate the demand for ICUs and it Table 4. Infected and death percentages per age group in Brazil as of 21th March, 2020. S i (0) is just multiplication of corresponding age group population percentage (second column in Table 1) by the total population N, discounted the corresponding number of infected in the age group. R i (0) = 0 as at the time of writing it was not a significant data (just 6 recovered individuals). It is a common assumption for early stages of the infection, as is the case in Brazil and in many other countries. depends heavily on the number of infected (see Eq 6), we are going to use to value of R 0 which best fits the variable I, so R 0 = 2.9 even though the fit for D seems better.

Results and discussion
With the model initialized and callibrated as described in the previous section, we obtain the results by solving the system of ordinary differential equations (ODEs) given by Eqs (1) to (5) with the Julia Programming Language's package DifferentialEquations [39]. Our main result is to show that, even though the current NPI measure taken in Brazil have led to a substantial decrease in the number of infections as compared to no NPI since the beginning of the reported cases in Brazil (see red and blue curves in Fig 3 (bottom)), the current measures are not enough to prevent the collapse of the health system in a short period of time with million of infected persons (see Figs 3 (top) and 4).
Even with the combination of CSU, HVQ and SD60+ taking place, our model predicts millions of infected individuals, with a peak taking place around the middle of May, 2020 (see Fig  3 (top)) in agreement with the projection for Brazil in Ref. [40], and consequently an exponential increase in the demand for ICUs. In fact, as can seen in Fig 4, already at the end of April we will surpass the current capacity of ICUs. The health system is still under treat in the current scenario, indicated to collapse by the end of Abril, 2020 (around 30 days after t0 == March 21, 2020), vigorously crossing the 5500 ICUs barrier. Moreover, the scenario is even worse if the imposed NPIs are relaxed (as being constantly debated by the Brazilian federal government), pointing out tens of millions of infected individuals, see black curve in Fig 3  (bottom).
On the positive side, we have also identified a window of 25 days-from the March 21st to April 16th-in which, similarly to what has been done in China, if more severe control measures are be applied, one can control the virus spread and keep the ICU demand below the threshold (see the green curve in Fig 4). Given that the actual NPI scenario in Brazil is represented by the combination of the lowest values of the g i functions in Table 2 for CSU, SD60+ and VHQ, the only possible measure is an intense quarantine (last column of Table 2). Except a intense quarantine, all the other scenarios are more catastrophic, meaning a faster collapse of the system. Although an emergence expansion and/or other measures can be tried, it is unlikely to keep the pace with the exponential spread of the infection. Even if we are underestimating the number of available ICUs, the saturation will still take place for the current scenario (as it shows a positive concavity), being only reversed with the intense quarantine implementation.
In our model, this intense quarantine is supposed to be applied around 20 days after our initial time (March, 21). As discussed in the NPI section, no measure is instantaneous efficient, taking on average a time t β � 14 days to be completely noticed [22]. In spite of that, notice that the increase in the ICU demand is rapidly contained. Moreover, it is important to mention that such a intense quarantine should last enough time until we reach a plateau in the ICU demand. Not only the quarantine protects the health system, but also corresponds to a minimization of the number of deaths, massively reducing it.
In Tables 5 and 6, we show the estimated number of infected people and deaths, respectively, in the current scenario as well as if other NPI measures start to take place at April 11th (20 days after March 21st). Observe how the total of infected is still very high in the current scenario: around 3.15 million infected individual, with an astonishing 393 thousands deaths. These estimations is based from the initial date up to 150 days ahead, where the disease is modelled to be controlled. As discussed, the only scenario which considerably reduces the number of infected persons and deaths is an intense quarantine. It reduces to 34.5 thousand infections and 1300 deaths. In particular, we notice that changing the current NPI to SD60+ (f social distancing of people above 60 years old) as been currently discussed by the Brazilian federal government, is completely catastrophic. As this group corresponds to the majority of critical care demands and deaths, their isolation is certainly necessary. However, taken alone it leads to �27 millions infected and 723 thousand dead individuals (see the fifth columns in Tables 5  and 6). An important observation is that our predictions are likely to be a lower bound to the actual numbers, since the confirmed cases are potentially underestimated given the lack of a widespread clinical testing in Brazil with more accurate data, it is possible that the fitting in Fig  2 would produce an even higher value for R 0 . In addition, our model can be applied to other countries or regions with minimal adaptation, since it will require only updates in Tables 1-3. Finally, although we have employed many data-driven assumptions, the results presented here may still underestimate the threat to the national health system due to the particular social problems in Brazil, such as: (i) high level of cardiopathologies, a reported relevant comorbidity; (ii) considerable number of people without proper water and wasting supplies; (iii) large number of people living in the same house in peripheral zones due to housing deficit, (iv) high density of obesity cases and some other immune suppressant diseases, just to cite a few. On the positive side, there can be a potential defense against SARS-Cov-2 for those BCG-vaccinated [41], which belongs to the universal vaccination program in Brazil [42]. Furthermore, the public health system in Brazil has great capillarity, in principle being capable to identify potential cases in the very beginning of the symptoms. However, that also requires widespread clinical tests to identify infected individuals.

Conclusion
In this work we proposed a data-driven age-structured census-based SIRD-like epidemiological model capable of forecasting the spread of COVID-19 in Brazil in a number of NPI scenarios. We remark that our approach is fairly general and thus can be applied to treat particular regions or cities, if the required data is available.
As we have shown, the early NPI measures taken by states and cities such as the total closure of schools, universities and non-essential services, the social distancing and isolation of individuals above 60 years and the voluntary home quarantine have already lead to significant reduction in the number of infections as well as delaying the time for the peak of contamination. Thus, these measures have been extremely important to give the authorities the necessary time for the adapting and preparing before the peak of the epidemy.
Notwithstanding, the current measures are not enough. Our model predicts that even if the current NPIs are not relaxed, as early as mid April the number of severe cases requiring hospitalization will surpass the current number of available ICUs, starting the collapse of the health system. However, a intense quarantine, if implemented in the following days, can rapidly change the increase in the number of infections and keep the demand for ICUs below the threshold, amounting to hundred of thousands of saved lives. On the other hand, we demonstrate that the relaxation of the already imposed control measures in the next days, as currently debated at some sphere of the Brazilian federal government, would be catastrophic, with a total death toll passing the one million mark.
In a nutshell, a continued quarantine, tighter than the current one and with a duration of a couple of weeks, is most likely the only solution to avoid the collapse of the health systems in Brazil. We hope that the gigantic difference in numbers, showing how different measures can lead to a reduction in infections and deaths of the order of hundreds of thousands or even millions, can provide a rational guide for the future decisions by the competent authorities.