Estimating the severity of COVID-19: Evidence from the Italian epicenter

We provide results on the level of COVID-19 excess mortality in the Italian region of Lombardy and in the province of Bergamo using official and original data sources. Since February 2020 Lombardy and in particular the province of Bergamo have been severely hit by the novel COVID-19 infectious disease. Combining official statistics, retrospective data and original data (i.e., obituaries and death notices) we provide a tentative estimate of the number of deaths either directly or indirectly, associated with COVID-19 as well as the total number of persons infected. Our findings suggest that the reported number of deaths attributable to COVID-19 identified by public authorities accounts only for one half of the observed excess mortality between March 2020 and previous years.


Introduction
The Lombardy region has been at the epicenter of the novel coronavirus (COVID-19) Italian epidemic. Since the detection in Rome of two COVID-19 positive tourists traveling from Wuhan on January 31, 2020, and the early detection of several cases in the northern area of the country, the disease spread for almost two months without much public intervention. On February 23, 2020, the first COVID-19-associated death was recorded in an hospital of Bergamo, a Lombardy's province. Despite the worrying signals, national politicians, public authorities, and local business industry confederations were dismissing the risks of COVID-19. See [1] for a detailed timeline of the key events of the COVID-19 Italian outbreak. By the end of March 2020, official reporting from national public authorities, available from the Civil Protection Department data website https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni, counted 7,593 COVID-19-associated deaths and 44,773 infected individuals, respectively the 57.7% and 40.5% of all the Italian cases.
Initial contributions have assessed the severity of COVID-19 providing either "crude" or adjusted-biased case fatality rate (CFR) measure based on model on infection dynamics [2][3][4][5]. Broadly speaking, these measures account for the share of deaths linked to a specific disease in comparison with the total number of individuals that have been diagnosed with that disease in a given period, and can be interpreted as the probability that a patient dies from the disease. These measures are important from an epidemiological point of view, since they allow an assessment of the healthcare capacity in response to an outbreak. However, these measures have limitations. In particular, there would be clear biases if one uses the aggregate numbers of reported cases and COVID-19-associated deaths to compute CFR estimate to understand the severity of the disease [6,7].
On the one hand, there might be an underestimation of the intensity during the first phases of an epidemic if several days pass between the moment an individual experiences the first symptoms and the day health authorities detect the infection, causing a censoring bias [8][9][10]. On the other hand, there might be an overestimate of the fatality rate if health authorities tend to concentrate only on severe cases, resulting in poor information on milder cases and causing an ascertainment bias. Similar bias will be present if the health-care system has reached its capacity constraint and is not able to detect further cases regardless of the severity. Adjusted estimates might as well face measurement issues as they mainly rely on the underlying assumptions on how to attenuate the bias. Despite the possible adjustments, however, these limitations make the CFR only a partial guidance to policymakers, mostly because it does not clearly reveal the actual consequences of an outbreak on the general population. To provide a more reliable evaluation of the severity of a disease in the general population it is common in the epidemiological literature to focus on the excess of mortality. This is usually measured looking at the rates of death in the population and compare it with a seasonal baseline level referring to periods in which the disease was not circulating [11][12][13].
Our contribution is to provide a first estimate of the excess of mortality rate in the general population in the Italian region of Lombardy during the months of the COVID-19 outbreak, and to assess the validity and the practical reliability of official figures combining different data sources. Specifically, we compare the official number of all-cause deaths reported in March 2020 with those reported in the same month of the previous year in a sample of Lombardy's municipalities. Next, with this result in hand, we recover the expected number of total persons infected for different levels of hypothetical CFR. These results add to the recent evidence on the excess of mortality associated with COVID-19 using data from other countries [14,15]. In addition, we show that the number of deaths associated to COVID-19 identified by public authorities are only 50% of the observed excess mortality rate between March 2020 and March 2019. Finally, we confirm this evidence by using an alternative method that we propose as a valid substitute for the use of unreliable (or unavailable) official data about population death rate.

Data collection
We collected data from several sources. We create a measure of excess of mortality which compares the number of all deaths observed in March 2019 to those of March 2020 for a sample of Italian municipalities in the region of Lombardy. The Italian National Statistical Institute (ISTAT) released information on the mortality rates for 548 of the 1,507 total Lombardy's municipalities, accounting for roughly 67% of Lombardy's total population. Data have been released on April 9, 2020. The population coverage at the provincial level for Lombardy province is: 72.66% for Bergamo, 67.61% for Brescia, 51.58% for Como, 73.38% for Cremona, 55.76% for Lecco, 75.98% for Lodi, 67.17% for Mantova, 83.97% for Milan, 60.93% for Monza-Brianza, 51.77% for Pavia, 47.23% for Sondrio and 25.54% for Varese. The sample contains municipalities which, between January 1 2020 and March 28 2020, registered an increase of the mortality rate of at least 20% compared to the period 2015-2019. For more information see https://www.istat.it/it/archivio/240401. We merge this information with data on COVID-19 associated deaths at the municipality level obtained from the Lombardy Region website on March 30, 2020. Data refer to the status reported by regional public authorities on the March 30, 2020. The list contains, for each Lombardy's municipality, the number of dead patients who tested positive during hospitalization or post mortem. We use these variables in section 3.1, to compare the reported number of COVID-19 associated deaths with the actual number of excess of deaths observed in the same period.
In section 3.1.1 we use municipal mortality data that were collected through a survey sent by the two main local newspapers L'eco di Bergamo and Giornale di Brescia and the data analysis firm inTwig to all register offices of municipalities belonging to Bergamo and Brescia provinces. As of March 31, 2020, our final dataset contains 332 municipalities, of which 139 (resp. 193) refer to Bergamo (resp. Brescia) province, and represent 71% (resp. 95%) of the total province population.
In Section 3.3 we use an alternative measure of mortality rates. We digitalized obituaries published by L'Eco di Bergamo that is the most read and circulated daily newspaper in the province of Bergamo. Our final dataset contains 3,083 unique individuals from March 1 to March 31, 2020.
Finally, in Section 3.4, we collected data for individuals who tested positive to COVID-19 in the Italian region of Lombardy in March 2020. The available data do not identify the actual day of death. We only observe whether an individual in the sample was dead or alive at the moment we collected the data, i.e., on March 31, 2020. The dataset contains a total of 43,176 cases.

Hypothetical case fatality rate
To assess the actual infection rate, in section 3.2 we perform a back-of-the-envelope computation using different fatality ratios as reported in [5] (Table 1 of the cited manuscript) for each age group in Lombardy. For details about the modelling assumptions which are assumed for the infection dynamics and for modelling the relations between infection and mortality rates one can refer to [2] and [5] methodological sections. Under the restrictive assumption that infections follow a uniform distribution across each age group and that the age-specific theoretical fatality ratios reported in [5] are valid also for Lombardy, we compute the hypothetical fatality rate multiplying the theoretical fatality ratio (first column of Table 1) by the share of population for each age group (second column of Table 1). We report the results of this computation in the last column of Table 1. Data on age groups are provided by the Italian National Statistical Institute at https://www.istat.it/en/population-and-households?data-andindicators. Next, summing up together all the elements of the third column of Table 1, we find an aggregate hypothetical fatality rate of approximately 1.5. Finally, in Section 3.2, we use this hypothetical mortality rate to compute the share of population likely to be infected, by dividing for each province the excess mortality rate (i.e., excess mortality over total population in 2019) by 0.015.

How severe is under-reporting in COVID-19 fatality rate and diffusion in Lombardy?
Comparing fatality rates across years.  Narrowing down our focus, Fig 2 reports the distribution of mortality rates for Lombardy's provinces. The difference between the COVID-19 -associated deaths and our measure of excess deaths is glaring and observed across provinces. Fig 3 reports the geographic distribution of unexplained excess mortality rates for each municipality in Lombardy. Darker shades of blue refer to municipalities with higher rates of unexplained excess mortality. A coarse look at Fig 3 suggests the existence of a strong spatial correlation between the most hit municipalities, in line with the hypothesis that COVID-19 spreads through human-to-human close interaction.
Why under-reporting?. There are several possible explanations for this form of underreporting observed in the recorded number of COVID-19-associated deaths. A first may be attributable to the higher fatality ratio in older age groups, in line with [2], a finding that we also point out in Section 3.4. Further, infected patients may die at home without having the ability to be hospitalized and therefore be tested. Presumably, this issue exacerbates for older patients, who are more likely to have a bad pre-existent clinical condition. Connected to this issue, the mere capacity constraint on the number of tests that laboratories can run per day may compromise the correct assessment of the total number of infected patients. This evidence suggests that the actual number of COVID-19 -associated deaths and, therefore, the exact number of infected individuals, is much higher than those reported by public authorities. It is worth noting, however, that the residual excess mortality may also be explained by factors other than COVID-19, such as an excessive stress of the public health system. These explanations are consistent with a recent report published by ISTAT on May 4, 2020 available at https://www.istat.it/it/files//2020/05/Rapporto_Istat_ISS.pdf. For instance, has been registered a reduction of hospitalization for acute myocardial infarction in the northern regions of Italy of about 50% compared to March of the previous year [16]. On the other hand, the likelihood of dying for causes unrelated to COVID-19 (e.g. car or workplace accidents, pollution-related diseases) should be substantially lower during a lockdown period. In fatc, on February 23, 2020, the regional authorities mandated a reduction in public activities (e.g. suspension of sport events and concerts, partial or complete shutdown of all non-necessary commercial activities, etc.), resulting in a substantial decrease in traffic density and air pollution according to the European Space Agency (see http://www.esa.int/ESA_ Multimedia/Videos/2020/03/Coronavirus_nitrogen_dioxide_emissions_drop_over_Italy#. Xoh12NOLYiI.link).

A closer look: Bergamo and Brescia.
Since the data we are analyzing refer to a selected sample of municipalities which experienced an abnormal increase in mortality rates, we use an alternative source of information to exclude potential sample selection biases and focus only on the two mostly hit provinces. Also in this case, we observe an important increase in the number of deaths. At the aggregate level, we report an increase going from 1,635 deaths in March 2019 to 7,680 deaths in March 2020, equivalent to a 370% increase in the average municipality mortality rate compared to 2019 or previous years. The results are comparable when we look at previous years, as the average number of deaths for the period 2012-2018 was 1,619. Therefore, compared to the other provinces, Bergamo and Brescia experienced a much higher increase in the excess of death. Of these 6,045 excess deaths, only 2,948 deaths have been officially linked to COVID-19, implying that COVID-19-related fatalities account approximately only for half of the observed excess deaths. This proportion is similar to the one we find when using the data from ISTAT at the 28th of March.

A tentative estimate of the actual infections
We report in Table 3 the share of population that is likely to be infected for different fatality ratios (ranging between 1% and 2%) and each Lombardy's province. In Panel A we report the results for the provinces of Bergamo and Brescia, considering the whole month of March. In the province of Bergamo the share of infected population is estimated to be between 20% and 40%, while in the province di Brescia between 11% and 22%. In Panel B we focus on the data provided by the Italian statistical office. Using these numbers, the real share of infected individuals in Lombardy should range between 6% and 12% of the population. Finally, Fig 4 reports the geographic distribution of the hypothetical share of infected population based on a theoretical mortality rate of 1.5. Darker shades of blue refer to municipalities with higher shares of infected population. Also in this case, darker areas are concentrated around the provinces of Bergamo, Lodi and Cremona and seem to be spatially correlated with each other.

An alternative source of data: Obituaries and death notices
Because mortality data may not be immediately available during a pandemic peak, we also use obituaries or death notices published in local newspapers as an alternative source for measuring the number of deaths. Obituaries always contain individual characteristics such as name, surname, age, gender, date of death and place of death (municipality) representing an important and additional source of information to understand the phenomenon better. In March 2019 the number of total deaths registered by official statistics in the entire province was 886 less than one-third of deaths reported this March in newspaper obituaries. Summary statistics show that the average age is 80.9 years, and males represent 59.8% of fatalities. Moreover, looking at the distribution by age group we observe that 0.55% is between 15-49 years, 2.49% between 50-59, 8.42% between 60-69, 27.61% between 70-79, 42.36% between 80-89 and  Notes: This table reports, for each province, the share of population that is likely to be infected for a range of different hypothetical COVID-19 fatality rate. In Panel A we report the estimates using the data collected by newspapers of a sample of 332 municipalities of the province of Bergamo and Brescia covering, respectively, 71% and 95% of the total province population. In Panel B estimates are based on a sample of 548 municipalities provided by the Italian statistical office, covering 67% of the regional population.
https://doi.org/10.1371/journal.pone.0239569.t003 finally 18.54% has more 90 or more years. There has been a significant increase in the daily number of published obituaries after the first week of March being constantly over 100, while during the previous year the daily number was on average around 20. In Fig 5, we plot the daily evolution of obituaries (dashed line) published from February 24, 2020 to April 11, 2020, as well as the daily number of official deaths (solid line) published by ISTAT. Considering the period going from February 24, 2020 to March 28, 2020 the correlation between official deaths and obituaries is 0.98. Restricting our analysis to the municipalities in the province of Bergamo for which ISTAT released the official data of mortality, we observed that on average obituaries account for 60% of the total registered deaths in the sample. Using this proportion, we can assume that the true number of deaths at the provincial level is roughly twice the number of obituaries published on the newspaper. Specifically, we can estimate that the total number of deaths in March 2020 is 5,138, while the total number of deaths in March 2019 is 1,476, and the excess mortality with respect to March 2019 equals 3,662 (i.e., more than a 400% increase). Using the same reasoning of section 3.1, these additional deaths are probably attributable to COVID-19.
Considering a fatality rate of 1.4%-which is the fatality ratio adapted to the provincial age structure using [5]-the true number of cases should be 260,000 in the province of Bergamo that has 1.1 million inhabitants. This would imply that 24% of the provincial population was infected. This ratio is not too far away from what we find when calculating the excess of mortality by using the data from ISTAT (28%) or the data from the newspapers' survey (29%). Considering more conservative estimates of the fatality rate of 1.1% the number of cases raises to 330,000, i.e. 30% of the province's population. Overall, estimates about the spread of the virus using obituaries and death notices are consistent with what reported in Table 3 where different source of data were used.

Suggesting evidence using individual-level data
We show in this section some graphical results using the "crude" measure of CFR. Despite the kind of biases we are facing using this measure, we can still provide informative elements to assess the relative risks between different demographic groups. Because of possible issues related with measurement errors and sample selection, we decided no to comment the absolute numbers. In principle, all individuals that belong to the sample of detected positive had faced the same potential selection bias. To account for the possible delay between COVID-19 detection and patient's recovery or death (i.e., data censoring), we report graphical evidence about the variation of individual death risks over time, starting from the day a patient tested positive, and for different demographic characteristics.
In Fig 6 we report such results by gender, while in Fig 7 we focus on different age groups. In line with previous studies-e.g. [2]-we find a higher severity of the disease in older age groups as well as in male individuals. Also, individual death risk increase over time since the day of COVID-19 detection.

Concluding remarks
Since the spread of the novel COVID-19 emerged in Lombardy in early February 2020, thousands of patients have died, especially in the Province of Bergamo. Understanding the diffusion and assessing the fatality rate of COVID-19 is crucial for implementing effective public and health policies in tackling the disease. Unfortunately, public authorities' reporting and statistics significantly underestimated the "true" numbers of cases and COVID-19-associated deaths since there exists a vast proportion of infected patients who are not detected. Such underestimation may be the result of the policy adopted by the Lombardy region authorities, mandating that only patients who experience severe clinical symptoms are eventually tested with reliable diagnostic methods (e.g. nasopharyngeal swab) because of the limited capacity of the microbiology laboratories during the pandemic peak. Since February 26, 2020 until now, the Lombardy health authorities prescribe testing only those patients who experience symptoms or who have been in contact with an infected patient [1].
In this contribution, combining official statistics, retrospective and original data we gauge the reliability of official data published during a pandemic peak by combining multiple data sources and providing a tentative estimate of the "true" number of COVID-19 -associated deaths, as well as the total number of persons infected. Our estimates, not free from limitations, suggest that the reported mortality rate attributable to COVID-19 accounts only for one half of the observed excess mortality rate between March 2020 and March 2019. However, part of the residual excess mortality may also be explained by factors other than COVID-19, such as an excessive stress of the public health system.