Tracking R of COVID-19: A new real-time estimation using the Kalman filter

We develop a new method for estimating the effective reproduction number of an infectious disease (R) and apply it to track the dynamics of COVID-19. The method is based on the fact that in the SIR model, R is linearly related to the growth rate of the number of infected individuals. This time-varying growth rate is estimated using the Kalman filter from data on new cases. The method is easy to implement in standard statistical software, and it performs well even when the number of infected individuals is imperfectly measured, or the infection does not follow the SIR model. Our estimates of R for COVID-19 for 124 countries across the world are provided in an interactive online dashboard, and they are used to assess the effectiveness of non-pharmaceutical interventions in a sample of 14 European countries.


A.1 SIS Model
We now show that the estimator in the main text in Eq. (3) is also obtained when the dynamics of the disease follow the SIS model. The SIS model, again in discrete time, is given by The only difference from the SIR model in the main text in Eq.
(1) is that formerly infected individuals do not obtain immunity after recovery and instead again join the pool of susceptibles. As is well known, the basic reproduction number in the SIS model is the same as in the SIR model (e.g., Chowell and Brauer, 2009) and given by R (t) 0 = β t /γ. Since the law of motion for I t in the SIS model is the same as in the SIR model, we can repeat the same steps as in the benchmark analysis to arrive at the same estimator as in in Eq. (3).

A.2 Generalized SIR Model
In this section, we show that we can also obtain the estimator in Eq. (3) from a generalized version of the SIR model with stochastic shocks. Specifically, we consider the following generalized SIR model: Differently from the baseline model, we introduce random shocks v 1,t and v 2,t . The shocks are i.i.d., and the time-varying support of v 1,t is [0, S t−1 − β t I t−1 /N ], while the support of v 2,t is [0, I t−1 + β t I t−1 S t−1 /N ]. We also assume that E t−1 [v 1,t − v 2,t ] = 0, so that the conditional expectation E t−1 [I t ] coincides with the value for I t given by the noiseless SIR model. With these modifications, the model can capture rich patterns of infectious disease dynamics. For example, "super spreader events" can be modeled either as v 1,t shocks or as a spike in β t . The model can also capture richer forms of population structures than the baseline SIR model. For example, if individuals who are more infectious (e.g., those with more connections in a network model) are more likely to become infected first, that can be captured by assuming that β t becomes lower over time.
Defining the time-varying basic reproduction number as R (t) 0 = β t /γ, and R t ≡ R (t) 0 S t−1 /N , we obtain that where v t ≡ (v 1,t − v 2,t )/I t−1 . Taking expectations on both sides of the equation, we arrive at Hence, the generalized SIR model of the present section leads to the same estimator as the baseline SIR model. Finally, we note that if γ varies deterministically over time, the equation above remains essentially unchanged, the only difference being that γ is replaced by γ t . If γ t follows a non-degenerate stochastic process, then the estimator for E[R t ] would need to correct for the covariance between γ t and R t .

A.3 Foundation for the Local-Level Model
When estimating R t , we use a local-level specification for the growth rate of the number of infected individuals. In this section, we show that the local-level model arises naturally in the SIR model in the early stages of an epidemic when the transmission rate follows a random walk.
Specifically, consider the generalized SIR model in Section A.2 of the Supplementary Appendix. We now specialize the process for the transmission rate β t to be a random walk: with a given initial value β 0 > 0. We calculate that in the early stages of the epidemic when S t ≈ N . Defining the effective reproduction number early on in the epidemic as R t = β t /γ, we therefore have directly that the growth rate of I t follows a local-level model with gr(I t ) = γ(R t − 1) + v t R t = R t−1 +η t whereη t ≡ η t /γ. Provided that the distribution of v t can be approximated with a normal distribution, we directly obtain the specification in the main text in Eq. (5). Alternatively, to obtain an exact normal local-level model, we could assume that v 1,t = v 2,t = 0 (no shocks in the original model, just as in the baseline model in the main text in Eq. (1)) but that instead of observing the true growth rate gr(I t ), we only observe gr(I t ) + ε t where ε t is i.i.d. normally distributed mean-zero measurement error.

A.4 Gibbs-Sampling Algorithm
In this section we discuss how the parameters of the state-space model can be estimated with a Gibbs-sampling algorithm à la Carter and Kohn (1994). Besides being a natural robustness check to our methodology, this algorithm uses a different approach to ensure the non-negativity of R t . To use the Kalman filter, we need to estimate σ 2 and σ 2 η in the state-space model in Eq. (5) of the main text. For the Gibbs sampler, we break the model down into conditional densities from which we can sample iteratively. The algorithm is the following: 1. Conditional on σ 2 ε and σ 2 η , use the Kalman filter to infer the state vector R t ; 2. Conditional on the sequence of R t computed in the previous step, take samples of σ 2 ε and σ 2 η from their prior distributions; 3. Conditional on the new draws of σ 2 ε and σ 2 η , estimate R t ; 4. Verify that each element of R t is positive. If yes, store the draws of σ 2 ε and σ 2 η . If not, discard the draws and repeat step 2; 5. Compute the Kalman smoother; 6. Iterate forward for as many replications as needed.
Finally, we contrast the estimates of R t obtained with the Gibbs sampler to our baseline estimates. We obtain a correlation of 0.85 between the two sets of estimates. Credible intervals are highly correlated as well. Overall, we conclude that our estimates are similar across different statistical estimation approaches.

A.5 SEIR Model: Monte Carlo Simulation
Our estimation method uses a structural mapping between R t and gr(I t ) derived from the basic SIR model. While we can generalize the baseline SIR model to include stochastic shocks (Section A.2 of the Supplementary Appendix), and the estimator remains valid when the disease follows an SIS model (Section A.1 of the Supplementary Appendix), the model is nevertheless restrictive. In particular, it ignores incubation periods as well as transmission during the incubation period. These features are likely especially important when modeling COVID-19.
We now perform a simulation exercise to see how our estimator of R t performs in a richer model that accounts for these additional features. Specifically, we consider an SEIR model in which the exposed are infectious: Here, E t denotes the number of individuals that are exposed at day t, κ is the daily transition rate from exposed to infected, and ∈ [0, 1] measures the degree to which the exposed are less infectious than the infected. If = 0, the exposed are not infectious at all, and we obtain the benchmark SEIR model. If = 1, the exposed are as infectious as the infected, and the model is isomorphic to the standard SIR model. We calibrate the parameters following Wang et al. (2020) who apply the benchmark SEIR model (with = 0) to study the dynamics of COVID-19 in Wuhan. In particular, we use κ = 1/5.2 and γ = 1/18 as in Wang et al. (2020). Then, we set = 2/3, following Ferguson et al. (2020) who assume that symptomatic individuals are 50% more infectious than the asymptomatic (that is, −1 = 1.5). Finally, we choose β by targeting a basic reproduction number of R 0 = 2.6, again as in Wang et al. (2020). In the model above, R 0 is given by R 0 = β/γ + β /κ, implying β = R 0 γκ/(γ + κ). The formula yields β ≈ 0.12. Finally, we set S 0 = 11 × 10 6 (approximating the population size of Wuhan), E 0 = R 0 = 0, and I 0 = 1.
The Monte Carlo design is as follows. First, we simulate the deterministic system in Eq. (A.1) using the parameters above. Then, we calculate the growth rate in the true number of infected individuals, i.e., gr(I t ) = I t /I t−1 − 1. However, instead of knowing the true growth rate, the statistician is assumed to observe a noisy version of it given by Notes: Estimates of the effective reproduction number (R t ) when the true dynamics of the disease follow an SEIR model. We investigate two values for γ est. , the transition rate from infected to recovered, that are used when estimating R t . First, we use the correct value of γ est. = (γ −1 + κ −1 ) −1 ≈ 0.043. Second, we use a misspecified values of γ est. = 1/10. Average values from 10,000 Monte Carlo replications are shown. See text for more details.
gr(I t ) = gr(I t ) + ε t . Here, ε is an i.i.d. normal disturbance with mean zero and standard deviation of 0.10. The standard deviation of the disturbances is roughly equal to the range of the true growth rates. Hence, the amount of noise used in the simulation is fairly large. For each realization of the disturbances, we estimate R t using our method. As in our empirical application, only data after 100 total cases have been reached is used. We investigate two values for γ est. that are used when estimating R t . First, we consider a situation in which the statistician uses the correct time that individuals are infected, given by γ est. = (γ −1 + κ −1 ) −1 where γ and κ are the true parameter values of the SEIR model. Second, we investigate a case in which the statistician incorrectly thinks that individuals are infectious only for ten days (γ est. = 1/10). We repeat the process for 10,000 Monte Carlo replications.
The results of the Monte Carlo simulation are shown in Figure A.1. When the statistician uses the correct number of days that an individual is infectious (that is, taking into account the incubation time), the estimates of R t from our method are very close to their true theoretical values. That is in spite of the fact that our estimator for R t is derived assuming that the dynamics of the disease are described by an SIR model. However, we also show that if the statistician misspecifies the number of days than an individual is infectious (assuming 10 days instead of the true number of 23.2 days), the estimates of R t are substantially biased, especially in the early stages of the epidemic. As is to be expected, underestimating the number of days that an individual is infectious leads to a downwards bias in the estimates of R t early on in the epidemic (when R t > 1), and upwards bias when the true R t falls below one. Overall, the results imply that the new method performs well when estimating R t even when the true dynamics of the disease do not follow the SIR model, provided that the duration of infectiousness used in the estimation is sufficiently accurate.

A.6 Effects of Potential Data Issues
We now discuss the effects of various data issues on the performance of our estimator.
Reporting delays. In practice, data may be subject to significant reporting delays. For example, suppose that due to testing constraints there is a lag of days between the date that an individual becomes infected and the date on which the case is registered. In this case, the estimates of R t would also be subject to delay of days. If there are significant reporting delays, one may first obtain, say, one-week-ahead forecasts of new cases, and then use these forecasts to construct a time series for I t .
Imperfect detection. A natural worry with any estimator of R t is that it may be substantially biased if not all of infected individuals are detected. Given the simplicity of our estimator, we can analytically assess the effects of imperfect detection.
Suppose that the true numbers of susceptible, infected, and recovered individuals are given by S * t , I * t , and R t , respectively. Their evolution is the same as in the SIR model in the main text in Eq. (1). However, we only observe I t = α t I * t , where α t ≡ I t /I * t is the detection rate. In practice, α is typically less than one, although the mathematical calculation below does not require this.
With this notation, we have that since gr(α t ) × gr(I * t ) ≈ 0 at a daily frequency; the approximation is exact in continuous time. Using the approximation above and Eq. (2) in the main text, we therefore obtain that the bias of the estimator under imperfect detection is given bŷ We now discuss several cases of practical importance: • Constant detection rate (α t = α). If the detection rate is constant over time, then our estimator is unbiased, andR t = R t . Hence, for example, even if we only detect 10% of the infectives (but the fraction detected remains constant over time), the estimator remains unbiased. Note that if the number of tests increases over time, that is not inconsistent with α t = α given that the number of infected individuals is likely to be growing at the same time. • Constant growth in the detection rate (gr(α t ) = g α ). If the growth rate of α t is constant over time, then our estimate of R t is biased upwards if g α > 0 and downwards if g α < 0. Note, however, that we are often mostly interested in the trend of R t over time and whether the trend is affected by various policy interventions. The trend in R t is estimated accurately even if g α = 0. Intuitively, constant growth in the detection rate leads to a level bias, but the slope is still estimated correctly. • Detection rate converges over time (α t → α). The final case of interest occurs when the detection rate converges to a constant over time. For example, if everyone is detected towards the end of the epidemic, we would have α t → 1. Since our method uses Kalman-filtering techniques to estimate the growth rate of I t , transient fluctuations in α t would have a limited effect on the estimates of R t later on in the sample. Given that we are often precisely interested in the behavior of R t in the later stages of the epidemic (when the detection rate is likely fairly constant), our method would still yield reliable estimates.
To provide a quantitative illustration, we perform a small-scale Monte Carlo study. We choose the parameters of the simulation to match our empirical estimates. First, we use our empirical estimates of R for the world as a whole for the first 50 days of the sample as the true values of R. From these values of R, we calculate the true (but unobserved) values of the growth rate of the number of infected individuals as µ t ≡ gr(I * t ) = γ(R t − 1), using γ = 1/7. Finally, the observed growth rate-as seen by the statistician-is generated as gr(I t ) = gr(α t )(1 + µ t ) + µ t + ε t . We use the empirical estimate of σ 2 ε to simulate ε t shocks (i.i.d. draws from a normal distribution with mean zero). We then apply our estimator to the generated data on gr(I t ), using the same value of γ = 1/7 to back out R.
In the simulation we consider three underdetection scenarios: • Constant underdetection. Given our analytical results, it is immaterial what percentage of cases is detected, as long as that percentage is constant over time. • Ramp-up in testing. Next, we consider a situation in which-due to an increase in the number of tests performed-the fraction of detected infected individuals goes up from α 0 = 0.10 to α 14 = 0.15 (an increase of 50%) over a period of two weeks. After the two weeks, the detection probability remains constant at 0.15. Again, the precise values of α 0 and α 14 are irrelevant, and what matters is the growth rate in the detection rate. • Stochastic underdetection. Finally, we suppose that the detection rate satisfies gr(α t ) = φ gr(α t−1 ) + ν t where ν t is an i.i.d. normally distributed shock. Intuitively, the detection rate is assumed to be unconditionally constant, but its growth rate is stochastic and follows an AR(1) process. We set the persistence parameter to φ = 0.75 in order to allow for fairly long-lasting deviations from the average   detection rate. To ensure that the variance of gr(I t ) remains constant across simulations (to ensure an apples-to-apples comparison), we suppose that 50% of the noise in the observed growth rate comes from variation in α t , with the other 50% coming from the ε t shocks. (Here, by "noise" we refer to the variation in gr(I t ) that is not solely due to the variation in µ t , namely, gr(α t )(1 + µ t ) + ε t .) Denoting our estimates of the variance of the growth rate byσ 2 µ and the variance of the irregular component byσ 2 ε , we therefore set Var We draw the initial value for gr(α 0 ) from its unconditional distribution.
The results of the Monte Carlo simulation are summarized in Figure A.2. As seen in the top panel, the estimated average effective reproduction numbers are fairly close to their theoretical values in all three scenarios. In the Testing Ramp-Up scenario, the estimates are biased upwards at the beginning of the sample, but the amount of bias is quantitatively relatively small. In addition, the estimates converge to those in the other two scenarios quite quickly. In all three scenarios, the estimates are able to pick up the reversal in the trend of R that happens at around time 30. The bottom panel of the figure plots the average absolute error of the estimates. As expected, the estimates are least accurate in the Stochastic Undertesting case, and mostly within 0.25-0.30 of the true R in the Constant Underdetection and Testing Ramp-Up scenarios.
Finally, in Figure A.3 we provide Monte Carlo estimates of the coverage frequency of credible intervals obtained by our estimation procedure. As seen in the graph, the credible intervals in the Constant Undertesting and Testing Ramp-Up scenarios have good coverage properties, with conservative confidence bounds. The credible bounds are narrower in the Stochastic Underdetection scenario, resulting in lower than nominal coverage frequency. However, the size distortion appears not too severe, especially considering the small sample size.
Imported cases. Our estimates may be biased if the fraction of cases that is imported changes over time (the previous results on imperfect detection apply to misclassification because of imported cases, too). If the source of infections is known, it is possible to correct for the issue by simply not including imported cases when constructing the time series for I t .

A.7 Estimation Details
To estimate R t of COVID-19, we use Bayesian filtering methods. We employ the following strategy to calibrate the prior distributions. First, we estimate a local-level model for gr(I t ) using a frequentist Kalman filter with diffuse initial conditions. In particular, we estimate the following model: The model is the same as Eq. (5) in the main text, except for a slight simplification in notation.
The procedure yields maximum likelihood estimates of σ 2 ε (variance of the irregular component) and the signal-to-noise ratio q ≡ σ 2 η /σ 2 ε for each country in the sample (with σ 2 η denoting the variance of the level component). We then use the distribution of σ 2 ε andq across countries to calibrate the priors for the precision of the irregular component (1/σ 2 ε ) and the signal-to-noise ratio (q). To ensure that the priors are not too "dogmatic," we inflate the variance of the estimates by a factor of 3 when calibrating the prior distributions. We use a gamma prior for both the signal-to-noise ratio and the precision of the irregular component, and we calibrate the parameters of the gamma distribution by matching the expected value and variance of the gamma-distributed random variables to their sample counterparts. Finally, we use a fairly uninformative normal prior for the initial value of the smoothed growth rate. The resulting priors are given in Table A.1.
Intuitively, these priors shrink the estimates of the precision and signal-to-noise ratio for each country towards their grand mean (average across countries). Such Bayesian shrinkage ensures that the parameter estimates are well behaved even though the sample Notes: Priors used in the Bayesian estimation of R t . See text for description on how the priors for the precision of the irregular component (1/σ 2 ε ) and the signal-to-noise ratio (q ≡ σ 2 η /σ 2 ε ) are calibrated based on cross-country frequentist estimates. size for many countries is fairly small, and the data are often noisy. We use the Stan programming language (Gelman, Lee, and Guo, 2015) to specify and estimate the Bayesian model. In particular, we use the pystan interface to call Stan from Python.

A.8 Empirical Validation
In this section, we perform two empirical validation exercises to check the performance of our estimates in practice.
Since our estimates are based on data on new cases, they may be misleading if new cases are subject to significant measurement problems. To help assuage this concern, we now perform the following exercise. We ask whether current values of R t help predict future growth in deaths. Since deaths are likely to be measured more accurately, this exercise provides a test of whether our estimates contain meaningful information and are not contaminated by data problems.
Formally, we consider the following regression:

Growth in Deaths in One Week
Corr. = 0.61 Notes: Relationship between current estimates of the effective reproduction number (R t ) and the growth rate of the number of new deaths in one week. The data is aggregated to a weekly frequency. Both variables are residualized to subtract country fixed effects by performing the within transformation. Only data after the cumulative number of deaths reaches 50 is included in the scatter plot. We include all countries in the John Hopkins database for which we have at least 20 observations after the outbreak. We remove data for the week of 2020-04-13-2020-04-19 in China that contain a large number of deaths that were previously unrecognized. where i denotes a particular country, and t indexes calendar weeks. Although our original data is daily, we aggregate to a weekly frequency; otherwise, measures of the growth rate of new deaths are too noisy. In addition, we only include weeks after the cumulative number of COVID-19 deaths has reached 50. After imposing these sample restrictions, we are left with 270 country-week observations across 68 countries. Given that we have panel data, we can include country fixed effects α i to account for time-invariant unobserved heterogeneity (such as differences in average age-a key correlate of COVID-19 mortality (Verity et al., 2020)-or family structures). The relationship given above is predicted by the baseline SIR model. Letting CFR = d t /I t− denote the case fatality rate (assumed to be constant over time), with standing for the average time between becoming infected and death, we have that gr(d t ) = gr(I t− ), yielding the regression equation above. The relationship is shown in Figure A.4. In the scatter plot, both variables are residualized to remove country fixed effects. We observe a strong positive relationship be- Notes: Relationship between current estimates of the effective reproduction number (R t ) and value of the movement index two weeks ago (first principal component of the six movement categories in Google (2020)). The data is aggregated to a weekly frequency. Both variables are residualized to subtract country fixed effects by performing the within transformation. We include all countries in the John Hopkins database for which we have at least 20 observations after the outbreak.
tween the value of R t this week and the growth in deaths one week later (corr. = 0.61). In Figure A.5, we demonstrate that there is also positive correlation (corr. = 0.48) between R t and deaths two weeks later. We note that while the average medical duration from the onset of symptoms to death for COVID-19 is longer than two weeks (around 18 days, see , the duration from reported cases to deaths is likely to substantially shorter because of reporting delays. For example, Hortaçsu, Liu, and Schwieg (2020) assume that new cases of COVID-19 are reported with a lag of 8 days in their baseline calculations (5 days for symptoms to appear, consistent with the evidence from Lauer et al. (2020) and Park et al. (2020), as people are unlikely to be tested without exhibiting symptoms, and an additional 3 days to capture delays in obtaining test results, based on andecdotal reports from the US). Since deaths are likely reported in a timely manner, if new cases are reported with a lag of 8 days, we would expect an average duration of around 10 days (≈1.43 weeks) between reported cases and reported deaths.
As a second validation check, we ask whether our estimates of R t are correlated with past movement data, as it should be if the estimates are meaningful. For information on movement, we use aggregated smartphone location data collected by Google and published in their "COVID-19 Community Mobility Reports" (Google, 2020). Google provides data on percentage changes in movement for six types of places: (i) groceries and pharmacies; (ii) parks; (iii) transit stations; (iv) retail and recreation; (v) residential; and (vi) workplaces. Since the six categories are strongly correlated, we take the first principal component of the six categories (the first principal component explains 83.03% of the total variance in the data). We refer to the first principal component as the "Mobility Index." As before, we only consider weeks after the cumulative number of confirmed COVID-19 cases in the country reaches 100. After imposing these restrictions, we are left with 792 country-week observations over 100 countries. Note that the number of countries is less than 124 because Google does not provide mobility data for all countries in our original sample. As shown in Figure A.6, current estimates of R t are strongly correlated with the value of the mobility index two weeks ago (corr. = 0.63).
For both validation exercises performed in the present section, we include all countries for which we have at least 20 observations after the onset of the epidemic (100 cumulative cases of COVID-19 reached) and satisfy any additional sample restrictions, as outlined above. If we narrow the sample down to countries with more and higherquality data-such as the sample of European countries analyzed in the main text-the correlations generally become substantially stronger. Hence, we consider the tests of the present section to be conservative.

Bayesian Smoother Classical Smoother Bayesian Filter Classical Filter
Notes: Estimated effective reproduction number (R) for China: filtered and smoothed estimates, using both Bayesian and classical estimation procedures. The Bayesian estimates are given by our baseline estimation procedure, as explained in the text. The classical estimates are obtained by maximum likelihood estimation (with diffuse initial conditions). The smoothed estimates use information from the full sample, while the filtered estimates at time t only use information up to time t. 65% credible bounds shown by the shaded areas.

A.11 Power Analysis
In this section, we study the statistical power of the event-study analysis in the main text using a Monte Carlo simulation.
We now describe the design of the power study. Intuitively, we simulate data using a stochastic process that is calibrated to match the properties of the observed data. We then simulate a sharp drop in the effective reproduction number-say, because of a lockdown. We apply our estimator to the simulated data and ask how often this abrupt change is detected by the estimation procedure.
To simplify the notation, we use the parametrization in Eq. (A.2). Optimal nowcasts from the local-level model in the steady state can be written as (Muth, 1960;Shephard, 2015, Section 3.4): is the signalto-noise ratio, and ω is the steady-state Kalman gain. Hence, nowcast errors, m t − µ t , follow an AR(1) process with Given that the shocks ε t and η t are uncorrelated, the variance of nowcast errors is The design of the power analysis is as follows: 1. We set γ = 1/7, and calibrate the remaining parameters of the data-generating process (q and σ 2 ε ) using the median values of the empirical estimates in the main text. The resulting parameter values are given in Table A.2. 2. We simulate µ t = γ(R t − 1). We initially set µ 0 = 1/7, implying an effective reproduction number of 2. At time 1, we simulate an abrupt decline in R t by setting µ 1 = 1/14, yielding a new effective reproduction number of 1.5, or a decline of 25%. For 2 ≤ t ≤ 14, we simulate µ t as a random walk, as in Eq. (A.2). 3. We simulate the observed growth rate of the number of infected individuals, y t = gr(I t ), as y t = µ t + ε t where ε t is an i.i.d. normal random variable with mean zero and variance σ 2 ε . 4. We simulate the nowcasts m t . We draw the initial nowcast m 0 from a normal Notes: Parameter values used in the power analysis. The parameters values for q (signal-to-noise ratio) and σ 2 ε (variance of the irregular component) are given by the median estimates from the 14 European countries considered in the empirical analysis in the main text. The mean duration of infectiousness is assumed to be γ −1 = 7, and the Kalman gain ω is calculated from Eq. (A.3). distribution with mean 1/7 and variance given in Eq. (A.4), and simulate further values of m t by the recursion in Eq. (A.3). The estimated value of R t is then given by our estimator. 5. We repeat steps 2-4 for 14 times, to simulate data for 14 "countries", as in the empirical application, and obtain estimates of the effective reproduction number by averaging across the 14 "countries". 6. We repeat steps 2-5 for 10,000 Monte Carlo replications.
The results of the power analysis are shown in Figure A.9. We observe that in 95% of the simulations, the change in R t is detected as soon as two days after the drop in R t . Hence, the analysis in the main text appears sufficiently powerful to detect moderate changes in R t . The key reason why the analysis has high statistical power, even though the signal-to-noise ratio is quite low (see Table A.2) is that data from multiple countries are used to obtain cross-country averages. This feature of the estimation procedure reduces estimation error substantially. While the signal-to-noise ratio is fairly low, we also note that the weight placed on data is that are more than one-week old is only (1 − ω) 7 ≈ 15.2%. Hence, one week after the change in R t , the estimates of R t are based primarily on data received after the change in R t .
The power analysis in the current section is arguably conservative. Specifically, we assume that after the abrupt decline, R t follows a random walk rather than staying fixed at the new level. As a result, as time goes on, the estimates of R t become more "spread out" across simulations, as is visible towards the end of Figure   Days After Change in R Notes: Power study of the statistical analysis in the main text (effects of non-pharmaceutical interventions on R t , the effective reproduction number). We simulate an abrupt change in R t from 2.0 to 1.5 using a data-generating process that is calibrated to match our empirical estimates in the main text. We then apply our estimator to the simulated data and ask how often the change is detected by the estimation procedure. The solid line gives the average estimate of R t , while the shaded lines denote 65% and and 95% of simulations (in particular, the shaded area for 65% of simulations is given by the 17.5 and 82.5 percentiles of the estimated R t across simulations, and the shaded area for 95% of simulations is given by the 2.5 and 97.5 percentiles of the estimated R t 's). 10,000 Monte Carlo replications used.

A.12 Comparison With EpiEstim Estimates
In this section, we compare our estimates of R for COVID-19 with estimates obtained using the method of Cori et al. (2013); see also Thompson et al. (2019). The method proposed by Cori et al. (2013)-arguably the most widely used approach to estimating the effective reproduction number of an infectious disease-is implemented in a popular R package EpiEstim. First, we download estimates of R obtained using EpiEstim provided by Xihong Lin's Group in the Department of Biostatistics at the Harvard Chan School of Public Health (http://metrics.covid19-analysis.org/). These estimates, just as in our case, are obtained using data from the John Hopkins CSSE repository. The parametrization used in the EpiEstim estimation assumes a time window of 7 days, and a gamma-distributed serial interval with a mean of 5.2 days and a standard deviation of 5.1 days. Next, we merge these estimates to our full sample of estimates. We restrict attention to countries for which we have at least 20 contemporaneous estimates from each of the two different methods. That leaves us with 108 countries and, on average, 42.14 time-series observations per country. Finally, we calculate the Pearson correlation coefficient between the two estimates of R for each country.
The results of this exercise are given in Table A.3. The two sets of estimates are highly correlated, with the average correlation coefficient equal to 0.80 (median: 0.89). The interquartile range is 0.78-0.95. These findings suggest that the two estimates are highly consistent with each other, despite very different estimation methods and underlying assumptions.  Notes: The graph plots the estimated effective reproduction number (R t ) one week before and three weeks after public events are banned in a country. The original sample consists of 14 European countries studied by Flaxman et al. (2020). For the event-study graph, we restrict the sample to countries for which data on R t is available for the whole event window. Heteroskedasticity-robust confidence bounds are shown by the shaded areas. Notes: The graph plots the estimated effective reproduction number (R t ) one week before and three weeks after case-based measures are introduced in a country. The original sample consists of 14 European countries studied by Flaxman et al. (2020). For the event-study graph, we restrict the sample to countries for which data on R t is available for the whole event window. Heteroskedasticity-robust confidence bounds are shown by the shaded areas. Days Since Intervention Notes: The graph plots the estimated effective reproduction number (R t ) one week before and three weeks after school closures are ordered in a country. The original sample consists of 14 European countries studied by Flaxman et al. (2020). For the event-study graph, we restrict the sample to countries for which data on R t is available for the whole event window. Heteroskedasticity-robust confidence bounds are shown by the shaded areas. Notes: The graph plots the estimated effective reproduction number (R t ) one week before and three weeks after social distancing is encouraged in a country. The original sample consists of 14 European countries studied by Flaxman et al. (2020). For the event-study graph, we restrict the sample to countries for which data on R t is available for the whole event window. Heteroskedasticity-robust confidence bounds are shown by the shaded areas. Notes: Results of panel-data regressions of the (log of) effective reproduction number (R t ) on indicator variables that are equal to 1 after the introduction of a non-pharmaceutical intervention (NPI) and 0 before the introduction. The regressions are similar to those in Table 2 in the main text except that the intervention variables are included separately (one-at-a-time). See the main text for more details.

A.15 GATHER Checklist
Item # Checklist item Reported on page # Objectives and funding 1 Define the indicator(s), populations (including age, sex, and geographic entities), and time period(s) for which estimates were made.

3, 5 2
List the funding sources for the work. 13

Data inputs
For all data inputs from multiple sources that are synthesized as part of the study:

3
Describe how the data were identified and how the data were accessed. 3, 5

4
Specify the inclusion and exclusion criteria. Identify all adhoc exclusions.

5
Provide information on all included data sources and their main characteristics. For each data source used, report reference information or contact name/institution, population represented, data collection method, year(s) of data collection, sex and age range, diagnostic criteria or measurement method, and sample size, as relevant.

3, 8 6
Identify and describe any categories of input data that have potentially important biases (e.g., based on characteristics listed in item 5).

A8, A9
For data inputs that contribute to the analysis but were not synthesized as part of the study:

7
Describe and give sources for any other data inputs.

Reported on page #
For all data inputs:

8
Provide all data inputs in a file format from which data can be efficiently extracted (e.g., a spreadsheet rather than a PDF), including all relevant metadata listed in item 5. For any data inputs that cannot be shared because of ethical or legal reasons, such as thirdparty ownership, provide a contact name or the name of the institution that retains the right to the data.

Data analysis 9
Provide a conceptual overview of the data analysis method. A diagram may be helpful. 3, 4, A13

10
Provide a detailed description of all steps of the analysis, including mathematical formulae. This description should cover, as relevant, data cleaning, data preprocessing, data adjustments and weighting of data sources, and mathematical or statistical model(s).

11
Describe how candidate models were evaluated and how the final model(s) were selected.

12
Provide the results of an evaluation of model performance, if done, as well as the results of any relevant sensitivity analysis.

13
Describe methods for calculating uncertainty of the estimates. State which sources of uncertainty were, and were not, accounted for in the uncertainty analysis.

14
State how analytic or statistical source code used to generate estimates can be accessed. 2

Item # Checklist item
Reported on page #

15
Provide published estimates in a file format from which data can be efficiently extracted. 2

16
Report a quantitative measure of the uncertainty of the estimates (e.g. uncertainty intervals). 6, 8, 10

17
Interpret results in light of existing evidence. If updating a previous set of estimates, describe the reasons for changes in estimates. 6, 7, 9

18
Discuss limitations of the estimates. Include a discussion of any modelling assumptions or data limitations that affect interpretation of the estimates.

11, 12
Notes: GATHER checklist (gather-statement.org) to facilitate the evaluation and replication of our empirical analysis. References to numbers in the Supplementary Appendix are prefixed with an "A." Hence, for example, "A15" refers to page 15 in the Supplementary Appendix, while "15" refers to page 15 in the main text.