Tracing DAY-ZERO and Forecasting the Fade out of the COVID-19 Outbreak in Lombardy, Italy: A Compartmental Modelling and Numerical Optimization Approach

Background. Since the ﬁrst suspected cluster of cases of coronavirus disease-2019 (COVID-19) on December 1st, 2019, in Wuhan, Hubei Province,


Introduction
The butterfly effect in chaos theory underscores the sensitive dependence on initial conditions, highlighting the importance of even a small change in the state of a nonlinear system. The emergence of a novel coronavirus, SARS-CoV-2, that caused a viral pneumonia outbreak in Wuhan, Hubei province, 5 China in early December 2019 has evolved into the COVID-19 acute respiratory disease pandemic due to its alarming levels of spread and severity, with a total of 195,892 confirmed infected cases, 80,840 recovered and 7,865 deaths in 153 countries as of March 16, 2020 ([1]). The seemingly far from the epicenter, old continent became the second-most impacted region after Asia Pacific to date, was declared recovered on February 26 [2]. A 38-year-old man repatriated back to Italy from Wuhan who was admitted to the hospital in Codogno, Lombardy on February 21 was the first secondary infection case ("patient 1"). "Patient 0" was never identified by tracing the first Italian citizen's movements and 20 contacts. In less than a week, the explosive increase in the number of cases in several bordering regions and autonomous provinces of northern Italy placed enormous strain on the decentralized health system. Following an a dramatic spike in deaths from COVID-19, Italy transformed into a "red zone", and the movement restrictions were expanded to the entire country on the 8th of March. 25 All public gatherings were cancelled and school and university closures were extended through at least the next month.
In an attempt to assess the dynamics of the outbreak for forecasting purposes as well as to estimate epidemiological parameters that cannot be computed directly based on clinical data, such as the transmission rate of the disease and 30 the basic reproduction number R 0 , defined as the expected number of exposed cases generated by one infected case in a population where all individuals are susceptible, many mathematical modelling studies have already appeared since the first confirmed COVID-19 case. The first models mainly focused on the estimation of the basic reproduction number R 0 using dynamic mechanistic 35 mathematical models ( [3,4,5,6]), but also simple exponential growth models (see e.g. [7,8]). Compartmental epidemiological models like SIR, SIRD, SEIR and SEIRD have been proposed to estimate other important epidemiological parameters, such as the transmission rate and for forecasting purposes (see e.g. [6,9]). Other studies have used metapopulation models, which include data 40 of human mobility between cities and/or regions to forecast the evolution of the outbreak in other regions/countries far from the original epicenter in China [3,10,11,5], including the modelling of the influence of travel restrictions and other control measures in reducing the spread ( [12]. Among the perplexing problems that mathematical models face when they 45 are used to estimate epidemiological parameters and to forecast the evolution of the outbreak, two stand out: (a) the uncertainty that characterizes the actual 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. ; number of infected cases in the total population, which is mainly due to the large percentage of asymptomatic or mild cases experiencing the disease like the common cold or the flu (see e.g. [13]), and (b) the uncertainty regarding 50 the DAY-ZERO of the outbreak, the knowledge of which is crucial to assess the stage and dynamics of the epidemic, especially during the first growth period.
To cope with the above problems, we herein propose a novel SEIRD with two compartments, one modelling the total infected cases in the population and another modelling the confirmed cases. The proposed modelling approach is 55 applied to Lombardy, the epicenter of the outbreak in Italy, to estimate the scale of under-reporting of the number of actual cases in the total population, the DAY-ZERO of the outbreak and for forecasting purposes. The above tasks were accomplished by the numerical solution of a mixed-integer optimization problem using the publicly available data of cumulative cases for the period 60 February 21-March 8, the day of lockdown of all of Italy.

The modelling approach
We address a compartmental SEIRD model that includes two categories of infected cases, namely the confirmed/reported and the unreported (unknown) cases in the total population. Based on observations and studies, our modelling hypothesis is that the confirmed cases of infected are only a (small) subset of the actual number of infected cases in the total population [5,13,6]. Regarding the confirmed cases of infected as of February 11, a study conducted by the Chinese CDC which was based on a total of 72,314 cases in China, about 80.9% of the 70 cases were mild and could recover at home, 13.8% severe and 4.7% critical [14].
On the basis of the above findings, in our modelling approach, the unreported cases were considered either asymptomatic or mildly symptomatic cases that recover from the disease relatively soon and without medical care, while the confirmed cases include all the above types, but on average their recovery lasts 75 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint longer than the non-confirmed, they may also be hospitalized and die from the disease.
Based on the above, let us consider a well-mixed population of size N . The state of the system at time t, is described by (see also Figure 1    The rate at which a susceptible (S) becomes exposed (E) to the virus is proportional to the density of infectious persons I in the total population, excluding the number of dead persons D. Our main assumption here is that upon confirmation, the infected persons I c go into quarantine, and, thus, they don't 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) transmit further the disease. The proportionality constant is the "effective" disease transmission rate, say β =cp, wherec is the average number of contacts per day and p is the probability of infection upon a contact between a susceptible and an infected.
Thus, our discrete mean field compartmental SEIRD model reads: The above system is defined in discrete time points t = 1, 2, . . ., with the The parameters of the model are: • β(d −1 ) is the "effective" transmission rate of the disease, • σ(d −1 ) is the average per-day "effective" rate at which an exposed person 105 becomes infective, • δ(d −1 ) is the average per-day "effective" recovery rate within the group of unreported (asymptomatic/mild) cases in the total population, is the average per-day "effective" recovery rate within the subset of confirmed infected cases 110 • γ(d −1 ) is the average per-day "effective" mortality rate within the subset of confirmed infected cases, 7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint • (d −1 ) is the per-day rate of the all cases of infected in the total population that get confirmed. This proportionality rate quantifies the uncertainty in the actual number of unreported cases in the total population. 115 Here, we should note the following: As new cases of recovered and dead at each time t appear with a time delay (which is generally unknown but an estimate can be obtained by clinical studies) with respect to the corresponding infected cases, the above per-day rates are not the actual ones; thus, they are denoted as "effective/apparent" rates.

120
The values of the epidemiological parameters σ, δ, δ c that were fixed in the proposed model were chosen based on clinical studies.
In particular, in many studies that use SEIRD models, the parameter σ is set equal to the inverse of the mean incubation period (time from exposure to the development of symptoms) of a virus. However, the incubation period does not 125 generally coincide with the time from exposure to the time that someone starts to be infectious. Regarding COVID-19, it has been suggested that an exposed person can be infectious well before the development of symptoms [15]. With respect to the incubation period for SARS-CoV-2, a study in China [16] suggests that it may range from 2-14 days, with a median of 5.2 days. Another study 130 in China, using data from 1,099 patients with laboratory-confirmed 2019-nCoV ARD from 552 hospitals in 31 provinces/provincial municipalities suggested that the median incubation period is 4 days (interquartile range, 2 to 7). In our model, as explained above, 1 σ represents the period from exposure to the onset of the contagious period. Thus, based on the above clinical studies, for 135 our simulations, we have set 1 σ = 3. Regarding the recovery period, the WHO-China Joint Mission in a study that is based on 55,924 laboratory-confirmed cases has reported a median time of 2 weeks from onset to clinical recovery for mild cases, and 3-6 weeks for severe or critical cases [17]. Based on the above and on the fact that within the subset 140 of confirmed cases the mild cases are the 81% [14], we have set the recovery period for the confirmed cases' compartment to be δ c = 1/21 in order to balance 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Thus, for each one of the above values of , the DAY-ZERO of the outbreak, the per-day "effective" transmission rate β and the "effective" per-day mortality rate γ, were computed by the numerical solution of a mixed-integer optimization problem with the aid of genetic algorithms to fit the reported data of confirmed 165 cumulative cases from February 21 to March 8, the day of the lockdown of Lombardy.
Here, for our computations, we have used the genetic algorithm "ga" provided by the Global Optimization Toolbox of Matlab [4] to minimize the following objective function: CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. where where, ∆X SEIRD (t), (X = I, R, D) are the cumulative cases resulting from the SEIRD simulator at time t; w 1 , w 2 , w 3 correspond to scalars serving in the general case as weights to the relevant functions.
In order to get the 90% confidence intervals for β and γ (as these are not provided by the genetic algorithm), we fixed the DAY-ZERO for the simulations 175 and run the Levenberg-Marquard around the optimal solution as implemented by the "lsqnonlin" function of matlab [19].
Thus, for each one of the five values of epsilon, we have repeated the above numerical optimization procedure fifty times and we kept the best fitting outcome.

180
At this point we should note that the above optimization problem may in principle have more than one near-optimal solutions, which may be attributed is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. ; Since the complete lockdown of almost all activities was decided as of March 8, we have taken an 90% reduction in the corresponding "effective" transmission rate to reflect the drop in the per-day average contacts per person.
Finally, we also attempt a forecasting of the fade out of the outbreak.
For the optimization procedure, we set as initial guesses (the intervals within  (4), and that at the very first days of the epidemic S ≈ N and D ≈ 0, the Jacobian of the system as evaluated at the disease-free state reads: The eigenvalues (that is the roots of the characteristic polynomial of the 210 Jacobian matrix) dictate if the disease-free equilibrium is stable or not, that is 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint if an emerging infectious disease can spread in the population. In particular, the disease-free state is stable, meaning that an infectious disease will not result in an outbreak, if and only if all the norms of the eigenvalues of the Jacobian J of the discrete time system are bounded by one. Jury's stability criterion [20] 215 (the analogue of Routh-Hurwitz criterion for discrete-time systems) can be used to determine the stability of the linearized discrete time system by analysis of the coefficients of its characteristic polynomial. The characteristic polynomial of the Jacobian matrix reads: where 220 a 3 = 1 The necessary conditions for stability read: The sufficient conditions for stability are given by the following two inequalities: 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. ; where, It can be shown that the second necessary condition (14) and the first sufficient condition (15) are always satisfied for the range of values of the epidemi-225 ological parameters considered here.
The first inequality (13) results in the necessary condition: It can be also shown that for the range of the parameters considered here, the second sufficient condition (16) is satisfied if the necessary condition (18) is satisfied. Thus, the necessary condition (18) is also a sufficient condition for Note that for = 0, the above expression simplifies to R 0 for the simple SIR model.

240
As discussed in the Methodology, we used five different values of (0.01, 0.05, 0.1, 0.15, 0.2) to assess the actual number of cases in the total population. Thus, for our computations, we run 50 times the numerical optimization procedure and for further analysis we kept the value of that gave the smaller fitting error 13 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. For the computed "effective" per-day transmission and mortality rates, we 255 also report the corresponding 90% confidence intervals instead of the more standard 95% CI because of the small size of the data. Under this scenario, the DAY-ZERO for the outbreak in Lombardy was found to be the 21st of January.
The "effective" per-day transmission rate was found to be β = 0.779 (90% CI: 0.777-0.781) and the "effective" per-day mortality rate for the confirmed cases the reported number of confirmed cases for that period.
Thus, according to the above results, on the 8th of March, the actual cumulative number of infected cases in the total population (taking into account the exposed cases to the virus) was of the order of 15 times more the confirmed cumulative number of infected cases.

16
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

17
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
The crucial questions about an outbreak is how, when (DAY-ZERO), why it started and when it will end. Answers to these important questions would add critical knowledge in our arsenal to combat the pandemic. The tracing of DAY-300 ZERO, in particular, is of outmost importance. It is well known that minor perturbations in the initial conditions of a complex system, such as the ones of an outbreak, may result in major changes in the observed dynamics. No doubt, a high level of uncertainty for DAY-ZERO, as well as the uncertainty in the actual numbers of exposed people in the total population, raise several barriers 305 in our ability to correctly assess the state and dynamics of the outbreak, and to forecast its evolution and its end. Such pieces of information would lower the barriers and help public health authorities respond fast and efficiently to the emergency.
This study aimed exactly at shedding more light into this problem, taking 310 advantage of state-of-the-art tools of mathematical modelling and numerical analysis/optimization tools. To achieve this goal, we addressed a new compartmental SEIRD with two infectious compartments in order to bridge the gap between the number of reported cases and the actual number of cases in the total population. By following the proposed methodological framework, we 315 found that the DAY-ZERO in Lombardy was the 21th of January, a date that precedes by one month the fate of the first confirmed case in Lombardy. Furthermore, our analysis revealed that the actual cumulative number of infected cases in the total population is around 15 times more the cumulative number of confirmed infected cases. Importantly, based on our simulations, we predict 320 that the fade-out of the outbreak in Lombardy will be by the end of May, if the strict isolation measures continue to hold.
To this end, we would like to make a final comment with respect to the basic reproduction number R 0 , the significance and meaning of which are very often misinterpreted and misused, thereby leading to erroneous conclusions. Here, we 325 found an R 0 ∼ 4, which is similar to the values reported by many studies in 19 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. However, we would like to stress that R 0 is NOT a biological constant for a disease as it is affected not only by the pathogen, but also by many other factors, such as environmental conditions, the demographics as well as, importantly, by the social behavior of the population (see for example the discussion in [22]).

335
Thus, a value for R 0 that is found in a part of world (and even in a region of the same country) cannot be generalized as a global biological constant for other parts of world (or even for other regions of the same country). Obviously, the environmental factors and social behavior of the population in Lombardy are different from the ones, for example, prevailing in Hubei. 340 We hope that the results of our analysis help to mitigate some of the severe consequences of the currently uncontrolled pandemic.

Funding
We did not receive any specific funding for this study.

345
The data used in this paper are given in the Supporting information. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

References
The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. ; https://doi.org/10.1101/2020.03.17.20037689 doi: medRxiv preprint