Monitoring Italian COVID-19 spread by a forced SEIRD model

Due to the recent evolution of the COVID-19 outbreak, the scientific community is making efforts to analyse models for understanding the present situation and for predicting future scenarios. In this paper, we propose a forced Susceptible-Exposed-Infected-Recovered-Dead (fSEIRD) differential model for the analysis and forecast of the COVID-19 spread in Italian regions, using the data from the Italian Protezione Civile (Italian Civil Protection Department) from 24/02/2020. In this study, we investigate an adaptation of fSEIRD by proposing two different piecewise time-dependent infection rate functions to fit the current epidemic data affected by progressive movement restriction policies put in place by the Italian government. The proposed models are flexible and can be quickly adapted to monitor various epidemic scenarios. Results on the regions of Lombardia and Emilia-Romagna confirm that the proposed models fit the data very accurately and make reliable predictions.


Introduction
The recent evolution of the COVID-19 epidemic has renewed the interest of the scientific and political community in the mathematical models for the epidemic. Many researchers all over the world are proposing new refined models to analyse the present situation and predict possible future scenarios (see for example [1][2][3][4][5][6]).
With this paper, we hope to contribute to the ongoing research on this topic and to give a practical instrument for a deeper comprehension of the outbreak evolution.
The modelling of epidemics is currently performed by Ordinary Differential Equations (ODEs) deterministic compartmental models [7,8], or by stochastic procedures [9]. We consider here deterministic compartmental models, based on a system of initial value ODEs, whose theory has been studied since the beginning of the century by W.O. Kermack and A. G. MacKendrick [10] who proposed the basic Susceptible-Infected-Removed (SIR) model. Since then, many modifications have been developed to study the epidemics of different infectious diseases [7]. These models split the population into groups, compartments, and reproduce their behaviour by formalising their reciprocal interactions. For example, the SIR model groups are: susceptible, who can catch the disease, infected, who have the disease and can spread it, and removed, who have either had the disease or have recovered, are immune or a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 isolated until recovery. The Susceptible-Exposed-Infected-Removed (SEIR) model also considers the exposed group, containing individuals who are in the incubation period. Since we believe that relevant information concerns not only infected but also Recovered and Dead populations, we choose to split removed population into Recovered and Dead, obtaining the SEIRD model. Such an approach is similar to [11], without accounting for infections from deceased to susceptibles, that do not apply to  Compared to previous outbreaks, such as SARS-CoV or MERS-CoV [2], the COVID-19 epidemic had a more rapid spread and it was proclaimed pandemic by the WHO on 11/03/ 2020. Indeed, the number of infected people would grow exponentially, if not contained by social isolation of the affected areas, as first evidenced by the COVID-19 outbreak in the Chinese city of Wuhan in December 2019 and currently applied almost worldwide.
In particular, the Italian government has started to impose severe restrictions since 09/03/ 2020 registering a substantial reduction in the growth rate of infected people ever since. The introduction of such social restricting measures requires an adaptation of the standard epidemic models to this new situation.
The evolution of the infected population depends on the basic reproduction number, denoted as R 0 , which measures how transferable a disease is. This quantity determines whether the infection will spread exponentially (R 0 > 1), die out (R 0 < 1), or remain constant (R 0 = 1). In this paper we propose a time-dependent infection rate function R 0 (t), instead of a constant parameter, since we believe that it could provide a model that better represents the COVID-19 outbreak evolution in Italy. The idea of introducing a non-constant infection rate has been adopted in several different situations. See for example in [7], ch. 5, where periodic infection rate functions model influenza epidemic seasonality, naming such modified models as forced models. In [12], an exponential infection rate function was used to represent the Ebola outbreak. Recently, in [13,14] the authors have proposed multi-scale models with several timedependent parameters, to study COVID-19 epidemic.
In this paper, we propose two infection rate piecewise functions and we calibrate the two forced SEIRD models employing the COVID-19 Italian data, relative to Lombardia and Emilia-Romagna regions. The actual data is relative to about three months, where the peak of the infected population has already been reached. The calibrated models yield excellent data fit on both regions. Moreover, we have simulated a prevision based on early-stage epidemic data, relative to the first 30 days: the comparison of the results with the real epidemic evolution shows a difference of very few days between the real and predicted peaks.

Materials and methods
In this section we introduce the epidemic data used for our analysis of COVID-19 in Italy, we present the proposed mathematical model, the method used for calibrating the parameters and the strategies applied for predictions.

Epidemic data and containment measures in Italy
In our analysis we refer to the dataset of the Italian Civil Protection Department, which is described in [15] and publicly available in the GitHub repository [16]. The data have been collected since 24/02/2020. We consider the infected population I as the infected active cases (field name: totale_attualmente_positivi in Table 1 [15]). The Recovered R and Dead D compartments are given in the fields dimessi_guariti and deceduti respectively (Table 1 [15]). This study considers two Italian regions, Lombardia and Emilia-Romagna.
On 09/03/2020 lockdown was declared for the entire country, while more severe restrictions were adopted in the different regions. For example in Lombardia, the Codogno municipal area was locked down from 21/02/2020 up to 08/03/2020, conversely in Emilia-Romagna the complete closure of the Medicina municipal area was applied from 16/03/2020 up to 04/ 04/2020 [17]. Therefore, we have chosen to calibrate the models in each region separately.
Further information about COVID-19 in Italy can be found at [18].

The proposed forced SEIRD model
The epidemiological compartmental models divide the population into groups, whose evolution in time is described by continuous functions, and describe the relations between the groups with ODEs. In this paper we use a SEIRD model [7,11], which considers five population compartments: susceptible (S), exposed (E), infected (I), Recovered (R) and Dead (D).
The system of equations in the SEIRD model is given by: where N is the total population, i.e. N = S + E + I + R + D at each time t, β is the infection rate, i.e. a coefficient accounting for the susceptible people get infected by infectious people, α is the incubation rate for the transition from exposed to infected, T I is the average infectious period and f is the fraction of all the removed individuals who die. The basic reproduction number R 0 has the following expression: The system (1) is solved by starting from an initial time t = t 0 where the values of the populations S(t 0 ), E(t 0 ), I(t 0 ), R(t 0 ), D(t 0 ) are assigned on the basis of the available data and integrated up to a final time T. However, the SEIRD model (1) with constant parameters β, α, f does not fit well the available data for more than few days, due to the rapidly changing social scenarios during the initial period of the COVID-19 spread in Italy. In particular, since the applied restrictions, described in the previous section, cause a decrease in the number of contacts between infected and susceptible, we model the coefficient β in (1) as a time-dependent decreasing function β(t), yielding a forced SEIRD model fSEIRD [7].
Moreover, to improve the model flexibility, we split the integration interval [t 0 , T] into p sub-intervals [t k , t k+1 ], k = 1, . . . p and propose two alternative piecewise infection rate functions: a rational function β r (t) or an exponential function β e (t). In each sub-interval [t k , t k+1 ] the infection rate functions have the following expression: for an assigned starting value β(t 0 ). The parameters α and f are assumed to be constant on each sub-interval, hence they are represented by piecewise constant functions: The proposed fSEIRD model is expressed as follows: with the following time dependent basic reproduction function: In the following, we name fSEIRDr and fSEIRDe the model (6) with the rational infection rate (3) and the exponential infection rate (4), respectively.

Parameter estimation and prevision
In order to estimate the parameters α k , f k and ρ k in (3), (4) and (5), we fit the solution of (6) to the measured data of the infected, recovered and Dead populationsÎ;R;D, relative to M days starting from 24/02/2020. We calibrate the parameters of fSEIRDr or fSEIRDe by solving a nonlinear least-squares problem with positivity and bound constraints. Mathematically, the problem can be formulated as follows. Let u(t) be the multi-value function: uðtÞ : ½t 0 ; T� À ! R 5 ; uðtÞ ¼ ðSðtÞ; EðtÞ; IðtÞ; RðtÞ; DðtÞÞ; solution of the ODE system (6) and let be the vectors of the model parameters. The function u(t) depends on the unknown parameters a, f, r, hence we write u(t; a, f, r) � u(t). We define the restriction of u(t; a, f, r) to the three Conceptually, this least-squares optimization is equivalent to a maximum likelihood estimation, where the likelihood of data given parameters is a multivariate normal distribution with mode v and spherical unit covariance. Eq (8) may be interpreted as the minimum of the negative log of this likelihood. The iterative trust-region based method implemented by the lsqnonlin Matlab function is applied to solve problem (8) (see [19] for details about this aspect). It is well known that the nonlinear problem (8) may have many local minima and that the iterative method, implemented by the solver, converges to one of them. Furthermore, the starting guess is discriminatory for the accuracy of the computed solution. To choose suitable starting guesses approximating the real parameters, we perform a two phase process, where in phase 1 we estimate the parameters β, α, f of the classical SEIRD model (1) on a restricted number of days M l < M [20] and then, in phase 2, we calibrate the parameters of fSEIRDr or fSEIRDe by applying the solutions of phase 1 as starting guesses. Indeed, the identification process in phase 2 requires initial values α 0 , f 0 , ρ 0 for the iterative process solving (8) and β(t 0 ) for the computation of the functions β(t) (3) or (4), therefore the parameters α, f, β computed in phase 1 are assigned as starting guesses. The starting value ρ 0 is fixed as ρ 0 = 0.9. To define the intervals [t k , t k+1 ] in Eqs (3) and (4), we fix a value Δ t > 0 and partition the measurements allowing the length of last interval [t p−1 , t p ] to be larger than Δ t > 0. Finally, we apply (8) to compute the parameters a, f, r.
After having carried out the estimation of the model parameters we use fSEIRDr, fSEIRDe to monitor the epidemic evolution and make some predictions about the infection behaviour in the successive few weeks. This information is extremely important to predict the length of the epidemic spread.

Results
In this section, we test the fSEIRDe and fSEIRDr models using data of Lombardia and Emilia-Romagna regions. In paragraph Model Calibration, we calibrate the two models on the whole time interval available in the dataset from 24/02/2020 to 24/05/2020, which includes the epidemic Infections peak. Then in paragraph Epidemic Evolution Forecast we test the models behaviour restricting the calibration time to the interval [24/02/2020, 27/03/2020], to monitor the COVID-19 evolution and forecast of the epidemic peak.
The differential systems are solved applying the ode45 Matlab function, implementing a variable step Runge-Kutta method based on Dormand-Prince formulae, with the following initial conditions: Iðt 0 Þ ¼Îð1Þ, Rðt 0 Þ ¼Rð1Þ, Dðt 0 Þ ¼Dð1Þ whereÎð1Þ;Rð1Þ;Dð1Þ correspond to the infected, recovered, Dead individuals in the first measurement day. Concerning the exposed population, for which no measurement is available, we set E(0) = 10 � I(t 0 ) in Emilia-Romagna. This value is reasonable for the initial epidemic evolution, leading to a basic reproduction index R 0 ' 6.6. Concerning the region of Lombardia, the same initial value E(0) = 10 � I(t 0 ) leads to an excessively high reproduction index. However, since the available data in the first days of the outbreak diffusion were uncomplete, we have decided to use that value of E(0) and concentrate our analysis in the subsequent times of the pandemic (from day 20th onwards). In future software versions, the value of E(0) could be possibly estimated from data.
where X mod represents a single population among the modelled compartments ({I,R,D} respectively) and X data is the corresponding compartment data in the calibration days t(i), i = 1, 2, . . ., N c , N c � M, and the Bayesian Information Criterion (BIC) [21], defined as follows: where N p represents the number of the parameters estimated by the model. The RMSE measures the average error performed by the model in predicting the outcome for an observation while the BIC takes into account the number of model estimated parameters and tends to penalize the inclusion of additional parameters. In both cases the best models are given by the lowest values.
In this study we choose as the average infectious period T I = 20(d), the time in which the viral RNA may be detected by means of laboratory procedures, as reported in [3]. This value is different from the infectious period reported in [22] which is variable in the interval [2(d),

Model calibration
To calibrate fSEIRDr and fSEIRDe models we perform the phase 1 to obtain the starting guesses to be used in phase 2. We apply the SEIRD model (1), with initial parameters α = 0.1 d −1 , f = 0.1 and β = 0.25 d −1 , on the data [16] relative to the first 15 days [24/02/2020, 09/03/2020] for both regions. The estimated parameters, applied as starting guesses in phase 2, are reported in Table 1. Concerning phase 2, we calibrate the fSEIRDr or fSEIRDs parameters using the available data relative of the first 90 days from 24/02/2020 up to 24/05/2020 and set ρ 0 = 0.9. We first investigate the choice of time partitions (9) by changing the value of the intervals length Δ t . Such value should balance the increasing number of parameters, when Δ t is small, with the increasing value of the RMSE, for large Δ t values. This behaviour has been studied by computing the BIC i values corresponding to each D t 2 I, where I � f3; 5; 7; 14; 21; 28gdays and then computing the values of the scaled BIC parameter [23]: where BIC min is the minimum value of the BIC i and Ik is the number of elements in the set I; the best value is obtained when Δ(BIC) i = 0. Comparing the plots reported in Figs 1 and 2, we observe that the two models reach the minimum BIC when Δ t is 7 for Emilia-Romagna data. For Lombardia we observe that fSEIRdr has very small values for 3 � Δ t � 14 while fSEIRDe is very sensitive to Δ t and reaches its minimum when Δ t = 14. Therefore, in the following we use  Regarding Lombardia region, we observe a quite good fit of the Recovered population ( Fig  6) while the Dead population is well fitted in the first 40 days, successively the model tends to over estimate the data (Fig 8). Conversely, for Emilia-Romagna data, the good fit of the infected population does not extend to Recovered and Dead compartments, although the Recovered data are quite well fitted in the last 10 days (Fig 5).
The two models show similar fitting properties, as confirmed by the BIC and RMSE values in Table 2 as well as by the infected, (Figs 3 and 4), Recovered (Figs 5 and 6) and Dead (Figs 7  and 8) populations. Finally, we observe in Figs 9 and 10 the different behaviour of the infection rate function β(t) in the two considered regions. The exponential function β e has a steeper decreasing behaviour, compared to the rational function β r . Hence we expect that fSEIRDe forecasts a reduction of the epidemic spread in a shorter time than fSEIRDr. Conditioned upon this model space, from Table 2 there is strong evidence in favour of the fSEIRDr model as a description of the epidemic in Lombardia (difference in BIC of 28), and weak evidence in favour of a fSEIRDe model in Emilia-Romagna (difference in BIC of 2).

Epidemic evolution forecast
In this section we investigate how these models could be used at an early stage of the epidemic evolution, to see how and to what extent they could yield useful information in terms of predicting the epidemic peak of the infected population. To this purpose, we calibrate fSEIRDe   and fSEIRDr using the data from 24/02/2020 to 27/03/2020 and then we use the calibrated models to make previsions until 23/06/2020. Observing the results reported in Table 3 we see that the infected peaks are localized quite accurately (maximum three days error). In Emilia-Romagna, both models over-estimate the real peak populations of about [70%-90%]. Conversely, the situation in the Lombardia region is more complicated, and the two models present two different possible evolutions. In this case only fSEIRDr localizes the peak precisely, still overestimating the infected numbers (137%). On the contrary, fSEIRDe finds an epidemic peak 21 days earlier but with a milder underestimate.
The global trend, represented in Figs 11 and 12 for Emilia-Romagna and Lombardia, confirms that fSEIRDr and fSEIRDe can be applied to predict possible epidemic evolutions even

PLOS ONE
from early-stage data. Hence, the different behaviour of the models can be usefully applied to predict different possible future scenarios. Concerning the basic reproduction functions RðtÞ in Fig 13 and in Fig 14, we choose to report it from the 20th day, to focus our observations on the prediction phase. Regarding the fSEIRDe model we observe that RðtÞ < 1 is achieved at t = 41(d) (4 April 2020) in Emilia-Romagna and t = 69(d) (2 May 2020) in Lombardia. The fSEIRDr model never reaches RðtÞ < 1 for Lombardia while for Emilia-Romagna it is reached at t = 116(d) (20 June 2020). In the case of fSEIRDe model the function RðtÞ matches the trend of the infected curve, whereas RðtÞ has too large values for fSEIRDr model. From our observations we believe that two factors cause this misbehaviour: the inaccurate initial value of the exposed population, which should be calibrated, and the evolution of the function α(t), whose change should be adaptively bounded by a more refined calibration procedure.

Conclusion
In this paper, we have proposed a forced SEIRD model and investigated two different infection rate functions for the analysis of the COVID-19 outbreak evolution in Italy. In our new formulation, we have partitioned the integration time into sub-intervals, where the model parameters have been adaptively estimated. The results obtained by fitting the data of two Italian regions, Lombardia and Emilia-Romagna, available from February 24th 2020 until May 24th 2020, show a very good fit to the data. We have then used the model to make predictions by calibrating it only over a short period of about 30 days, and we have compared our prevision with the actual collected data. We believe that the proposed model can be quickly adapted to monitor various infected areas at different epidemic stages. Concerning the Italian epidemic evolution, we are now facing the end of the movement restriction measures, and one crucial challenge is the prediction of potential new epidemic outbreaks, possibly connected to the spread of autumnal influenza. Further studies on forced models will be carried out in this perspective, together with further improvement of the calibration procedure.