Estimation of SARS-CoV-2 mortality during the early stages of an epidemic: a modelling study in Hubei, China and northern Italy

Background. The epidemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that originated in Wuhan, China in late 2019 is now pandemic. Reliable estimates of death from coronavirus disease 2019 (COVID-19) are essential to guide control efforts and to plan health care system requirements. The objectives of this study are to: 1) simulate the transmission dynamics of SARS-CoV-2 using publicly available surveillance data; 2) give estimates of SARS-CoV-2 mortality adjusted for bias in the two regions with the world's highest numbers of confirmed Covid-19 deaths: Hubei province, China and northern Italy. Method and Findings. We developed an age-stratified susceptible-exposed-infected-removed (SEIR) compartmental model describing the dynamics of transmission and mortality during the SARS-CoV-2 epidemic. Our model accounts for two biases; preferential ascertainment of severe cases and delayed mortality (right-censoring). We fitted our transmission model to surveillance data from Hubei province (1 January to 11 February 2020) and northern Italy (8 February to 3 March 2020). Overall mortality among all symptomatic and asymptomatic infections was estimated to be 3.0% (95% credible interval: 2.6-3.4%) in Hubei province and 3.3% (2.0-4.7%) in northern Italy. Mortality increased with age; we estimate that among 80+ year olds, 39.0% (95%CrI: 31.1-48.9%) in Hubei province and 89.0% (95%CrI: 56.2-99.6%) in northern Italy dies or will die. Limitations are that the model requires data recorded by date of onset and that sex-disaggregated mortality was not available. Conclusions. We developed a mechanistic approach to correct the crude CFR for bias due to right-censoring and preferential ascertainment and provide adjusted estimates of mortality due to SARS-CoV-2 infection by age group. While specific to the situation in Hubei, China and northern Italy during these periods, these findings will help the mitigation efforts and planning of resources as other regions prepare for SARS-CoV-2 epidemics.


Introduction
As of 2 March 2020, the 2019 novel coronavirus disease  epidemic that originated in Wuhan, China, has affected more than 60 countries and resulted in 88,948 confirmed cases and 3,043 deaths globally [1]. The transmission characteristics of COVID-19 appear to be similar to those of pandemic influenza and will likely facilitate further global spread [2]. During this early phase of a potential pandemic, it is critically important to obtain reliable estimates of the overall case fatality ratio (CFR), i.e., the proportion of all (asymptomatic and symptomatic) infected cases that will die as a result of the disease. Such estimates will help anticipate the expected morbidity and mortality due to COVID-19 and provide critical information for the planning of health care systems in countries that face an epidemic.
Obtaining reliable estimates of CFR can be challenging during the early phase of an epidemic [3,4]. A crude CFR of 2.3% was estimated based on 1,023 deaths out of 44,672 confirmed cases reported until February 11, 2020 [5]. So-called "crude estimates" of CFR from the reported numbers of confirmed cases and deaths are difficult to interpret due to the likely under-ascertainment of mild or asymptomatic cases and the right-censoring of cases with respect to the time delay from illness onset to death. Some analyses attempted to correct for right-censoring of deaths, leading to an estimate of CFR of 7.2% (95% confidence interval: 6.6%-8.0%) for Hubei province using a competing risk model [6]. Using data on exported cases and correcting for right-censoring of deaths occurred in China, another team reported a CFR estimate of 5.3% (95% confidence interval: 3.5%, 7.5%) among confirmed cases [7]. Finally, another team reported a CFR of 18% (95% credible interval: 11-81%) among cases detected in Hubei, accounting for the delay in mortality [8]. The same study provided adjusted estimates of the overall CFR based on data from the early epidemic in Hubei and from cases reported outside China at 1% (95% CI: 0.5%-4%). With the objective of correcting all listed biases, we fitted a dynamic transmission model to reported data of confirmed cases and deaths in Hubei [9] and obtained adjusted and age-specific estimates of the overall CFR of COVID-19 among both symptomatic and asymptomatic patients in the Hubei province until 11 February 2020.

The COVID-19 epidemic in Hubei, China
The outbreak of COVID-19 appears to originate from multiple zoonotic transmission events at the Huanan Wholesale Seafood market in Wuhan in early December 2019, with the animal source remaining unknown [10]. Early January 2020, a novel coronavirus (subsequently named SARS-CoV-2) was identified as the causal agent of the epidemic [11]. In a first phase of the epidemic, human-to-human transmission occurred at a high rate in Wuhan and other areas of the Hubei province, leading to an exponential growth of incidence ( Figure 1A). On 20 January, Chinese authorities implemented strict control measures in the Hubei province, including contact tracing aimed at identifying , treating and isolating cases and quarantining contacts, extension of holidays, temperature checks before accessing public areas, cancellation of mass gatherings and the promotion of extreme social distancing [10]. Three days later, a cordon sanitaire was imposed, with strict traffic restrictions. From 27 January, the daily incidence of cases by disease onset started plateauing, then decreased. The Chinese CDC published a description of the epidemiological characteristics of cases reported until 11 February 2020, including the age distribution of cases and deaths reported up to this point in China, that we applied to Hubei ( Figure 1B).

An age-structured model of COVID-19 transmission and mortality
We simulated the dynamics of the COVID-19 epidemic in Hubei from 1 January 2020 to 11 February 2020. We used an age-stratified susceptible-exposed-infected-removed (SEIR) compartmental model, with a distinction between asymptomatic and symptomatic infections. We stratified the population by 10-year ranges, leading to 9 age classes (0-9 years old, ..., 80 years old and more). After an incubation period of 5.6 days [12], 49% of infected people develop symptoms and become infectious [13] while the remaining remain asymptomatic and do not transmit the disease further. This proportion of 49% of symptomatic was estimated by testing every passenger on the Diamond Princess ship ( Figure 1C). We then fixed the time from onset to removal to 2.9 days [14]. These parameters are identical across all age classes. Figure 2: Schematic description of the COVID-19 transmission model. We considered five compartments for each age group k: susceptible S k , exposed E k , symptomatically infected I k , asymptomatically infected A k , and removed R k . The cumulative incidence of symptomatic cases is recorded in compartment C k , from which we compute the daily incidence of reported cases J k and the daily incidence of deaths D k . Model parameters: transmission rate β, incubation rate τ , probability of symptomatic infection ψ, removal rate µ, reporting rate of symptomatics ρ k (by age group), probability of death k among symptomatics (by age group), and delay from disease onset to death γ (discretized by day).
2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. and deaths that will occur after this date. On panels B and D, numbers are scaled by Chinese age distribution and by the number reported for the highest age group (80+).
We modelled the decrease in the transmission of SARS-CoV-2 due to the progressive implementation of control measures from 20 February by using a sigmoid function for the transmission rate. It includes four parameters: the initial transmission rate, the decrease in transmission due the control measures, the time delay between implementation and effect and the slope of this decrease. We assumed that symptomatic people had an age-specific case-fatality rate and that time from onset to death followed a log-normal distribution with mean 20.2 days and standard deviation 11.6 [12]. Thus, we could estimate the number of deaths at each time point and account for the deaths that occurred after 11 February 2020.
We simultaneously fitted our model to four data sets: (1) the number of confirmed cases by day of disease onset from January 1 to February 11, (2) the number of deaths by day of occurrence from January 1 to February 11, (3) the age distribution of all confirmed cases until 11 February and (4) the age distribution of all deaths reported by 11 February. These data were extracted from the CCDC report [9]. We assumed that all deaths were reported and that all symptomatic cases among people of aged 80 years and older were also reported. For the other age classes, we modelled the underreporting of symptomatic cases by an age-dependent reporting rate. We used negative binomial distributions to describe the number of reported cases and deaths and multinomial distributions to describe the distribution of cases and deaths over age classes. We implemented the model in a Bayesian framework using Stan [15]. All code and data are available from https://github.com/jriou/covid_adjusted_cfr.

Estimated case fatality ratio by age group during the COVID-19 epidemic in Hubei
Our model accurately describes the dynamics of transmission and mortality by age group during the COVID-19 epidemic in Hubei from 1 January to 11 February (Figure 2). Control measures implemented on 20 January led to a reduction of transmissibility by 99% (95% credible interval [CrI]: 97-100), with a diminution in case incidence after six days. Under the assumption that the risk of transmission of COVID-19 was homogeneous by age, so that the deficit of reported cases in the younger age classes can be attributed to surveillance bias, the total number of symptomatic cases was estimated to 74,200 (95%CrI: 67,000-81,600), 1.8 times more than the 41,092 reported cases during that period. Under the assumption that 49% of infections lead to symptoms, this implies that a total of 152,700 individuals (95%CrI: 137,800-167,900) were infected in the Hubei province during that period.
3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.04.20031104 doi: medRxiv preprint As of 11 February, 979 deaths have been reported among people infected before that date. Under our assumption regarding the distribution of the delay between disease onset and death, the model predicts a total of 2,441 deaths (95%CrI: 2,225-2702) among all people infected before 11 February. This results in an adjusted CFR of 1.6% (95%CrI: 1.4-1.8) among all people infected by COVID-19 in Hubei during that period. Moreover, our adjustment leads to sensible modifications of the age-specific CFR (Table 1). Compared to the crude CFR, the adjusted CFR is even lower in the younger age classes (0-59 years old) but higher in people aged 60 and more.

Strengths and limitations
In this work, we propose a comprehensive solution to the estimation of CFR from surveillance data during outbreaks [3], and apply it to data from the COVID-19 epidemic in Hubei, China. Our work has three important strengths.
(1) We use a mechanistic model for the transmission of and the mortality associated with COVID-19 that is a direct translation of the data-generating mechanisms leading to the biased observations of the number of deaths (because of right-censoring) and of cases (because of surveillance bias). Our model also accounts for the effect of control measures on disease transmission. (2) Our model is stratified by age group, which has been shown as a crucial feature for modelling emerging respiratory infections [16]. (3) The estimates rely on routinely collected surveillance data such as incident cases by disease onset, incidence deaths, and the age distribution of cases and deaths, and does not require individual-level data nor studies in the general population. The Bayesian framework allows the propagation of the uncertainty from data to the estimates.
Our work has several limitations. (1) Our results depend on the central assumption that the cause of the deficit of reported cases among younger age groups is a surveillance bias and does not reflect a lower risk of infection in younger individuals. The reason for this age shift is unknown [10]. Retrospective testing for COVID-19 of samples from influenza-like-illness surveillance found no positive test among children, but the sample sizes were small (20 per week including both adults and children) [10]. Uneven age distributions in the risk of infection can be attributed to immunological features, such as the lower circulation of H1N1 influenza in older individuals due to residual immunity [17]. An immunological explanation of the opposite phenomenon, with a lower susceptibility of younger individuals, seems unlikely, and there is no indication of pre-existing immunity to COVID-19 in humans [10]. Different contact patterns could play a role in a limited outbreak, but not in such a widespread infection, especially as household transmission seems to play a major role [10]. The last explanation that we assume here is that younger individuals, when symptomatic, have milder symptoms that decrease the probability of seeking care and being identified.
(2) In a related matter, our results depend on the assumption that older individuals have more severe symptoms and are more likely to be identified. In the absence of an outside reference point, the reporting rate cannot be estimated from surveillance data only. We chose to fix to 100% the reporting rate of infected individuals that have symptoms and are aged 80 and more, and estimate the reporting rates in other age groups relatively to that of older individuals. If further data, coming from a study in the general population, shows that this assumption is violated, this would lead to an overestimation of the CFR in our study.
(3) There is important uncertainty around the proportion of asymptomatic infections. Currently, the detection of asymptomatic patients in China is limited by the focus on symptomatic patients seeking care and the lack of seroprevalence data [18]. The proportion of symptomatic infections has been estimated to 58% (95% confidence interval: 33-83) in a small sample of cases exported to Japan [19]. During the outbreak on the ship "Diamond Princess", nearly all individuals were tested regardless of symptoms, leading to an average proportion of symptomatic infections of 49% in a sample size of 619, which was used in the present study [13]. Still, uncertainty about the proportion of symptomatic infections will remain until a large retrospective seroprevalence study is conducted in the general population, and our results are dependent on this estimate. Additionally, the dichotomization of infection into asymptomatic and symptomatic is a simplification of reality; the infection with SARS-CoV-2, will likely cause a gradient of symptoms in different individuals depending on age, sex and comorbidities [10]. The proportion of asymptomatic infections might show an age-dependent structure.
(4) Our findings regarding the CFR are specific to the context, and should be interpreted in that light. The findings describe the situation in Hubei from 1 January to 11 February, 2020. It was demonstrated there, that mortality rates have changed over time as a result of an improvement of the standard of care [10]. The standard of care and, as a result, the CFR is setting-dependent and cannot be directly applied to other contexts.

Conclusions
We developed a mechanistic approach to correct for the biases in the crude estimates of CFR and provide an adjusted CFR by age group with regards to the ongoing COVID-19 epidemic in Hubei, China between 1 January and 11 February. We find that 1.6% (1.4-1.8) of individuals infected with COVID-19 during that period with or without symptoms died or will die, with even more important differences by age group than suggested by the raw data. The probability of death among infected individuals with symptoms is estimated at 3.3% (2.9-3.8), with a 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.04.20031104 doi: medRxiv preprint 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.03.04.20031104 doi: medRxiv preprint steep increase over 60 years old to reach 36% over 80 years old. While specific to the situation in Hubei, China during this period, these findings will help the mitigation efforts and planning of resources as other regions prepare for COVID-19 epidemics.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.03.04.20031104 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint