Estimation of time-varying reproduction numbers underlying epidemiological processes: A new statistical tool for the COVID-19 pandemic

The coronavirus pandemic has rapidly evolved into an unprecedented crisis. The susceptible-infectious-removed (SIR) model and its variants have been used for modeling the pandemic. However, time-independent parameters in the classical models may not capture the dynamic transmission and removal processes, governed by virus containment strategies taken at various phases of the epidemic. Moreover, few models account for possible inaccuracies of the reported cases. We propose a Poisson model with time-dependent transmission and removal rates to account for possible random errors in reporting and estimate a time-dependent disease reproduction number, which may reflect the effectiveness of virus control strategies. We apply our method to study the pandemic in several severely impacted countries, and analyze and forecast the evolving spread of the coronavirus. We have developed an interactive web application to facilitate readers’ use of our method.

A recent work [35] demonstrated that R 0 is likely to vary "due to the impact of the performed intervention strategies and behavioral changes in the population".
The merits of our work are summarized as follows. First, unlike the deterministic ODEbased SIR models, our method does not require transmission and removal rates to be known, but estimates them using the data. Second, we allow these rates to be time-varying. Some timevarying SIR approaches [27] directly integrate into the model the information on when governments enforced, for example, quarantine, social-distancing, compulsory mask-wearing and city lockdowns. Our method differs by computing a time-varying R 0 , which gauges the status of coronavirus containment and assesses the effectiveness of virus control strategies. Third, our Poisson model accounts for possible random errors in reporting, and quantifies the uncertainty of the predicted numbers of susceptible, infectious and removed. Finally, we apply our method to analyze the data collected from the aforementioned GitHub time-series data repository. We have created an interactive web application (https://younghhk.shinyapps.io/ tvSIRforCOVID19/) to facilitate users' application of the proposed method.

A Poisson model with time-dependent transmission and removal rates
We introduce a Poisson model with time-varying transmission and removal rates, denoted by β(t) and γ(t). Consider a population with N individuals, and denote by S(t), I(t), R(t) the true but unknown numbers of susceptible, infectious and removed, respectively, at time t, and by s (t) = S(t)/N, i(t) = I(t)/N, r(t) = R(t)/N the fractions of these compartments.

Time-varying transmission, removal rates and reproduction number
The following ordinary differential equations (ODE) describe the change rates of s(t), i(t) and r(t): with an initial condition: i(0) = i 0 and r(0) = r 0 , where i 0 > 0 in order to let the epidemic develop [36]. Here, β(t) > 0 is the time-varying transmission rate of an infection at time t, which is the number of infectious contacts that result in infections per unit time, and γ(t) > 0 is the time-varying removal rate at t, at which infectious subjects are removed from being infectious due to death or recovery [33]. Moreover, γ −1 (t) can be interpreted as the infectious duration of an infection caught at time t [37]. From (1)-(3), we derive an important quantity, which is the time-dependent reproduction number R 0 ðtÞ ¼ bðtÞ gðtÞ :

PLOS ONE
Time-varying SIR based Poisson model for COVID -19 To see this, dividing (2) by (3) leads to where (di/dr)(t) is the ratio of the change rate of i(t) to that of r(t). Therefore, compared to its time-independent counterpart, R 0 ðtÞ is an instantaneous reproduction number and provides a real-time picture of an outbreak. For example, at the onset of the outbreak and in the absence of any containment actions, we may see a rapid ramp-up of cases compared to those removed, leading to a large (di/dr)(t) in (4), and hence a large R 0 ðtÞ. With the implemented policies for disease mitigation, we will see a drastically decreasing (di/dr)(t) and, therefore, declining of R 0 ðtÞ over time. The turning point is t 0 such that R 0 ðt 0 Þ ¼ 1; when the outbreak is controlled with (di/dr)(t 0 ) < 0. Under the fixed population size assumption, i.e., s(t) + i(t)+ r(t) = 1, we only need to study i (t) and r(t), and re-express (1)-(3) as with the same initial condition.
Denote by β ¼ ðb 1 ; . . . ; b q 1 Þ and γ ¼ ðg 1 ; . . . ; g q 2 Þ the unknown parameters, by Z I (t) and Z R (t) the reported numbers of infectious and removed, respectively, and by z I (t) = Z I (t)/N and z R (t) = Z R (t)/N, the reported proportions. Also, denote by I(t) and R(t) the true numbers of infectious and removed, respectively at time t. We propose a Poisson model to link Z I (t) and Z R (t) to I(t) and R(t) as follows: We also assume that, given I(t) and R(t), the observed daily number {Z I (t), Z R (t)} are independent across t = 1, . . ., T, meaning the random reporting errors are "white" noise. We note that (9) is directly based on "true" numbers of infectious cases and removed cases derived from the discrete SIR model (6). This differs from the Markov process approach, which is based on the past observations. With (6), (7) and (8), R(t) and I(t) are the functions of β and γ, since Given the data (Z I (t), Z R (t)), t = 1, . . ., T, we obtain ðβ;ĝÞ, the estimates of (β, γ), by maximizing the following likelihood or, equivalently, maximizing the log likelihood function where C is a constant free of β and γ. See the S1 Appendix for additional details of optimization.

Summary of estimation and inference for β(t), γ(t), R 0 ðtÞ, I(t), R(t)
Estimation: Let N be the size of population of a given country. The date when the first case was reported is set to be the starting date with t = 1, i 0 = Z I (1)/N and r 0 = Z R (1)/N. The observed data are {Z I (t), Z R (t), t = 1, . . ., T}, obtained from the GitHub data repository website mentioned in the introduction. We maximize (10) to obtainβ ¼ ðb 0 ;b 1 ; . . . ;b q 1 Þ and γ ¼ ðĝ 0 ;ĝ 1 ; . . . ;ĝ q 2 Þ. The optimal q 1 and q 2 are obtained via cross-validation. We denote by

Analysis of the COVID-19 pandemic among severely affected countries
Since the first case of COVID-19 was detected in China, it quickly spread to nearly every part of the world [6]. COVID-19, conjectured to be more contagious than the previous SARS and H1N1 [48], has put great strain on healthcare systems worldwide, especially among the severely affected countries [49]. We apply our method to assess the epidemiological processes of COVID-19 in some severely impacted countries.

Data descriptions and robustness of the method towards specifications of the initial conditions
The country-specific time-series data of confirmed, recovered, and death cases were obtained from a GitHub data repository website (https://github.com/ulklc/covid19-timeseries). This site collects information from various sources listed below on a daily basis at GMT 0:00, converts the data to the CSV format, and conducts data normalization and harmonization if inconsistencies are found. The data sources include In particular, the current population size of each country, N, came from the website of WorldoMeter. Our analyses covered the periods between the date of the first reported coronavirus case in each nation and June 30, 2020. In the beginning of the outbreak, assessment of i 0 and r 0 was problematic as infectious but asymptomatic cases tended to be undetected due to lack of awareness and testing. To investigate how our method depends on the correct specification of the initial values r 0 and i 0 , we conducted Monte Carlo simulations. As a comparison, we also studied the performance of the deterministic SIR model in the same settings. Fig 1  shows that, when the initial value i 0 was mis-specified to be 5 times of the truth, the curves of i (t) and r(t) obtained by the deterministic SIR model (6) were considerably biased. On the other hand, our proposed model (9), by accounting for the randomness of the observed data, was robust toward the mis-specification of i 0 and r 0 : the estimates of r(t) and i(t) had negligible biases even with mis-specified initial values. In an omitted analysis, we mis-specified i 0 and r 0 to be only twice of the truth, and obtain the similar results.
Our numerical experiments also suggested that using the time series, starting from the date when both cases and removed were reported, may generate more reasonable estimates.

Estimation of country-specific transmission, removal rates and reproduction numbers
Using the cubic B-splines (8), we estimated the time-dependent transmission rate β(t) and removal rate γ(t), based on which we further estimated R 0 ðtÞ, I(t) and R(t). To choose the optimal number of knots for each country when implementing the spline approach, we used 5-fold cross-validation by minimizing the combined mean squared error for the estimated infectious and removed cases. Fig 2 shows sharp variations in transmission rates and removal rates across different time periods, indicating the time-varying nature of these rates. The estimated I(t) and R(t) overlapped well with the observed number of infectious and removed cases, indicating the reasonableness of the method. The pointwise 95% confidence intervals (in yellow) represent the uncertainty of the estimates, which may be due to error in reporting. Fig 3 presents the estimated time-varying reproduction number,bðtÞĝðtÞ À 1 , for several countries. The curves capture the evolving trends of the epidemic for each country. In the US, though the first confirmed case was reported on January 20, 2020, lack of immediate actions in the early stage let the epidemic spread widely. As a result, the US had seen soaring infectious cases, and R 0 ðtÞ reached its peak around mid-March. From mid-March to early April, the US tightened the virus control policy by suspending foreign travels and closing borders, and the federal government and most states issued mandatory or advisory stay-home orders, which seemed to have substantially contained the virus.
The high reproduction numbers with China, Italy, and Sweden at the onset of the pandemic imply that the spread of the infectious disease was not well controlled in its early phases. With the extremely stringent mitigation policies such as city lockdown and mandatory mask-wearing implemented in the end of January, China was reported to bring its epidemic under control with a quickly dropping R 0 ðtÞ in February. This indicates that China might have contained the epidemic, with more people removed from infectious status than those who became infectious.
Sweden is among the few countries that imposed more relaxed measures to control coronavirus and advocated herd immunity. The Swedish approach has initiated much debate. While some criticized that this may endanger the general population in a reckless way, some felt this might terminate the pandemic more effectively in the absence of vaccines [50]. Fig 3 demon-strates that Sweden has a large reproduction number, which however keeps decreasing. The "big V" shape of the reproduction number around May 1 might be due to the reporting errors or lags. Our investigation found that the reported number of infectious cases in that period suddenly dropped and then quickly rose back, which was unusual.  Around February 18, a surge in South Korea was linked to a massive cluster of more than 5,000 cases [51]. The outbreak was clearly depicted in the time-varying R 0 ðtÞ curve. Since then, South Korea appeared to have slowed its epidemic, likely due to expansive testing programs and extensive efforts to trace and isolate patients and their contacts [52].  . Estimated I(t), R(t), β(t), γ(t), and R 0 ðtÞ. The US (left) and China (right) are shown based on the data up to June 30, 2020. The blue dots and the red dashed curves represent the observed data and the model-based predictions, respectively, with 95% confidence interval. More broadly, Fig 3 categorizes countries into two groups. One group features the countries which have contained coronavirus. Countries, such as China and South Korea, took aggressive actions after the outbreak and presented sharper downward slopes. Some European countries such as Italy and Spain and Mideastern countries such as Iran, which were hit later than the East Asian countries, share a similar pattern, though with much flatter slopes. On the other hand, the US, Brazil, and Sweden are still struggling to contain the virus, with the R 0 ðtÞ curves hovering over 1. We also caution that, among the countries whose R 0 ðtÞ dropped below 1, the curves of the reproduction numbers are beginning to uptick, possibly due to the resumed economy activities.

An interactive web application and R code
We have developed a web application (https://younghhk.shinyapps.io/tvSIRforCOVID19/) to facilitate users' application of the proposed method to compute the time-varying reproduction number, and estimated and predict the daily numbers of active cases and removed cases for the presented countries and other countries; see Fig 4 for an illustration.
Our code was written in R [53], using the bs function in the splines package for cubic B-spline approximation, the nlm function in the stats package for nonlinear minimization, and the jacobian function in the numDeriv package for computation of gradients and hessian matrices. Graphs were made by using the ggplot2 package. Our code can be found on the aforementioned shiny website.

Discussion
The rampaging pandemic of COVID-19 has called for developing proper computational and statistical tools to understand the trend of the spread of the disease and evaluate the efficacy of mitigation measures [54][55][56][57]. We propose a Poisson model with time-dependent transmission and removal rates. Our model accommodates possible random errors and estimates a timedependent disease reproduction number, R 0 ðtÞ, which can serve as a metric for timely evaluating the effects of health policies.
There have been substantial issues, such as biases and lags, in reporting infectious cases, recovery, and deaths, especially at the early stage of the outbreak. As opposed to the deterministic SIR models that heavily rely on accurate reporting of initial infectious and removed cases, our model is more robust towards mis-specifications of such initial conditions. Applications of our method to study the epidemics in selected countries illustrate the results of the virus containment policies implemented in these countries, and may serve as the epidemiological benchmarks for the future preventive measures.
Several methodological questions need to be addressed. First, we analyzed each country separately, without considering the traffic flows among these countries. We will develop a joint model for the global epidemic, which accounts for the geographic locations of and the connectivity among the countries.
Second, incorporating timing of public health interventions such as the shelter-in-place order into the model might be interesting. However, we opted not to follow this approach as no such information exists for the majority countries. On the other hand, the impact of the interventions or the change point can be embedded into our nonparametric time-dependent estimates.
Third, the validity of the results of statistical models eventually hinges on the data transparency and accuracy. For example, the results of Chinazzi et al. [58] suggested that in China only one of four cases were detected and confirmed. Also, asymptomatic cases might have been undetected in many countries. All of these might have led to underestimation of the actual number of cases. Moreover, the collected data could be biased toward patients with severe infection and with insurance, as these patients were more likely to seek care or get tested. More in-depth research is warranted to address the issue selection bias.
Finally, our present work is within the SIR framework, where removed individuals include recovery and deaths, who hypothetically are unlikely to infect others. Although this makes the model simpler and widely adopted, the interpretation of the γ parameter is not straightforward. Our subsequent work is to develop a susceptible-infectious-recovered-deceased (SIRD) model, in which the number of deaths and the number of recovered are separately considered. We will report this elsewhere.

Conclusion
Containment of COVID-19 requires the concerted effort of health care workers, health policy makers as well as citizens. Measures, e.g. self-quarantine, social distancing, and shelter in place, have been executed at various phases by each country to prevent the community transmission. Timely and effective assessment of these actions constitutes a critical component of the effort. SIR models have been widely used to model this pandemic. However, constant transmission and removal rates may not capture the timely influences of these policies.
We propose a time-varying SIR Poisson model to assess the dynamic transmission patterns of COVID-19. With the virus containment measures taken at various time points, R 0 may vary substantially over time. Our model provides a systematic and daily updatable tool to evaluate the immediate outcomes of these actions. It is likely that the pandemic is ending and many countries are now shifting gear to reopen the economy, while preparing to battle the second wave of virus attack [59,60]. Our tool may shed light on and aid the implementation of future containment strategies.