Change in global transmission rates of COVID-19 through May 6 2020

We analyzed COVID-19 data through May 6th, 2020 using a partially observed Markov process. Our method uses a hybrid deterministic and stochastic formalism that allows for time variable transmission rates and detection probabilities. The model was fit using iterated particle filtering to case count and death count time series from 55 countries. We found evidence for a shrinking epidemic in 30 of the 55 examined countries. Of those 30 countries, 27 have significant evidence for subcritical transmission rates, although the decline in new cases is relatively slow compared to the initial growth rates. Generally, the transmission rates in Europe were lower than in the Americas and Asia. This suggests that global scale social distancing efforts to slow the spread of COVID-19 are effective although they need to be strengthened in many regions and maintained in others to avoid further resurgence of COVID-19. The slow decline also suggests alternative strategies to control the virus are needed before social distancing efforts are partially relaxed.


Introduction
Since its initial outbreak in Wuhan, China in late 2019 and early 2020 [1], COVID-19 caused repeated rapid outbreaks across the global from February through April 2020. The extremely rapid spread of COVID-19 in China [2] does not appear to be an anomaly: the disease has shown a short doubling time (2.4-3.6 days) outside of China as well [3]. As of May 21, 2020, the virus caused 5,034,458 reported infections, and 328,730 deaths globally [4]. In response, most affected countries/regions have implemented strong social distancing efforts, such as school closures, working-from-home, shelter-in-place orders. As a result, the spread of COVID-19 slowed down substantially in some countries [5], leading to a flattening of the epidemic curve. As social distancing induces high costs to both society and individuals, plans to relax social distancing are discussed. However, changes in both the transmission rates and detection probabilities over time coupled with stochasticity due to reporting delays makes differentiating between truly subcritical dynamics and simply reduced transmission difficult.
In this report, we developed a deterministic-stochastic hybrid model and fitted the model to case incidence and death incidence time series data from 55 countries. Following the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 approach suggested by King et al. [6], we use a (partially) stochastic model and base our estimates on incidence rather than cumulative incidence data. Using both case count and death count of data allowed us to disentangle changes in surveillance intensity from changes in transmission [3]. We found evidence for large decreases in the country-level transmission rates, in several of the worst-affected countries. Importantly, using data up to May 6, 2020, we computed 99% confidence intervals to test whether or not the data were consistent with subcritical dynamics (i.e. the reproductive number R was below 1 on May 6, 2020). Most countries showed large decreases in transmission rates over time, and more than half of studied countries have transmission rates below the epidemic threshold. On the other hand, many countries still appear to be showing rapid exponential growth. Given its highly contagious nature, COVID-19 can spread rapidly when strong social distancing measures are lifted, even partially [3]. Alternative strategies that can effective control the virus are needed when social distancing measures are relaxed.

Data
Case count and death data were downloaded from the Johns Hopkins GitHub repository (https://github.com/CSSEGISandData/COVID-19) through May 6, 2020. Data included aggregate counts of reported cases and deaths at the country level and contained no identifying information. Any country in the data that had more than 2000 cumulative cases and 100 deaths by May 6, 2020 was included. To minimize the effect of repatriated cases we started each time series on the first day when the cumulative number of cases exceeded 100. All data processing and model fitting were otherwise done on the incidence scale. To address obvious bulk-reporting issues in the data (e.g. sudden zeros in the data followed by very large numbers), we smoothed the data using Tukey's 3-median method. Because several countries had days with a single death surrounded by days with no deaths, which the smoothing method would set to zero, days with a single death were not replaced with smoothed values. The original data and the smoothed data used for estimation are shown in S1 Fig in S1 File.

Model
We model the spread of COVID-19 as a partially observed Markov process with real-valued states S (susceptible), E (exposed), I (infected), and R (removed) to describe the latent population dynamics, and integer-valued states C 0 (to be counted), Y 1 (counted cases), D 0:3 (dying), and Y 2 (counted deaths) to model sampling into the data. We use multiple states to model the counted deaths to produce an Erlang distribution with mean 21 days and standard deviation 11 days in the time to death based on previous estimates of the time to death [7]. The model and all of its parameters (Table 1) have time units of days. The latent population model is governed by the following ordinary differential equations, where χ(t) is the time-variable transmission rate and N = S + E + I + R is assumed to be fixed over the run of the model. λ EI is the rate at which exposed persons become infectious and λ IR is the rate at which cases recover (i.e. are either no longer infectious or die). At every time interval, we sample persons moving from the E to I states into a stochastic arm of the model that is used to calculate the likelihood of the data.
To relate the latent population model to data, we randomly sample individuals from the unobserved population into a stochastic process that models the random movement from infection to being either counted as an observed case or counted as an observed death. The number of persons sampled into the observation arm of the model over a time interval dt is given by and ω are the probabilities that an infected person will be counted or die respectively, ρ c (t) and ω c are the probabilities of being not sampled or not dying respectively, and g(u) is a stochastic function that maps from real-value u to integer g(u); it takes value buc with probability u mod 1 and value due with probability 1 − (u mod 1). In plain English, the model tracks the random fate of each newly infected person as they move from the exposed to infected state with respect to eventually being observed as a case and/or a death in data. At each time step of length Δt, the change in state space is given by, where F t |C 0 is a random variate from BinðC 0 ; 1 À expðÀ l Y 1 DtÞÞ, and H it is a random variate from Bin D iÀ 1 ; 1 À exp À 4 21 Dt À � À � . X i indicates the i th element of Multinomial random variate defined above. The rate λ Y1 determines the rate at which persons who will be counted are counted (i.e. lower values of λ Y1 mean a longer delay in cases showing up in the data). The values of Y 1:2 are set to zero at the beginning of every day such that they accumulate the simulated number of cases and deaths that occur each day. Both the transmission rate and detection

Country
Model probability are allowed to vary with time in the following way: where t f is the final time in the given dataset. In plain English, the transmission rate is constant up to some time, t ð1Þ w , where it linearly increases or decreases to the value χ f by time t ð2Þ w . The model is constrained such that t ð2Þ w < t f À 20, that is, the transmission rate must be constant for at least 20 days before the end of the data collection period. This constraint is in place to avoid overfitting the final transmission rate. The detection rate likewise has a linear increase or decrease beginning at time t ρ ; however, the increase or decrease continues to a fixed point that is 20 days before the final datum. Variation in the detection rate (i.e. the probability that an infected case will be counted in a fixed interval) over the course of the epidemic can strongly bias estimates of the population growth rate (derivation for the exponential growth case in S1 File); not allowing the detection probability to change over time could lead to discordance in the case count and death count time series.

Parameter estimation
We assume that the data are Negative Binomial-distributed, conditional on the simulated number of cases and deaths that occur in a given time interval, i.e. the number of cases in the i th observation period has density NegBin(Y 1 , � 1 ) and the number of deaths in the i th observation period has density NegBin(Y 2 , � 2 ). The Negative Binomial is parameterized such that the first argument is the expectation and the second is an inverse overdispersion parameter that controls the variance of the data about the expectation; as � i becomes large, the data model approaches a Poisson with parameter Y i . Both � 1 and � 2 were estimated from the data.
Parameter estimation had two distinct steps: model selection and computation of confidence intervals. In the model selection phase, the model was fit to the data using an iterated particle-filtering method, implemented in the pomp R library [8]. To optimize the likelihood of the data, we used 1500 particles in 125 iterations for each country. To determine whether or not 1500 particles was sufficient to minimize the variance in the estimated the mean likelihood at a given parameter value we ran 20 independent particles filters on a single data set and found that the average deviation from the mean was less than 0.1 log units suggesting that we would be able to detect differences in the log likelihood greater than 0.1 log units. The reported likelihoods were measured using 4500 particles at the optimized Maximum Likelihood Estimates (MLEs).
For all fits, the initial state at time zero is computed by assuming there were I(0) infected persons, 21 days before the first reported death (by definition time one). The number of initial number of susceptibilities was assumed to be the predicted 2020 population size of the given country [9]. The model is then simulated forward for 21 days, assuming exponential growth with transmission rate χ 0 , which is taken at the initial state of the model at time zero. Because the data were highly variable in the complexity of the patterns they showed, we considered three nested models of increasing complexity for each country. The first model (model 1) assumed simple exponential growth with a constant sampling probability, i.e. ρ 0 = ρ f and t ð1Þ w ¼ t ð1Þ w ¼ 0. This amounts to a period of exponential growth with transmission rate χ 0 in the pre-data period and then a constant transmission rate χ f over the observation period.
The second model (model 2) allowed the transmission rate to vary but kept the detection probability constant, i.e. ρ 0 = ρ f . The third model (model 3) allowed both the transmission rate and detection rate to vary, i.e. all parameters estimated. We determined the best model for each country by a sequence of likelihood ratio tests first comparing model 1 and model 2 and then model 2 to model 3.
Because we are using an optimization method, we do not have access to samples from the likelihood surface directly. Therefore, to obtain estimates of the parametric uncertainly in the final transmission rate, χ f , we computed 99% confidence intervals using the profile likelihood method. For each country computed the profile likelihood by optimizing the model along a fixed grid with points every 0.1 units centered on the MLE of χ f value from the previous fit. Points were added to the grid until the measured likelihood on either side of the MLE was greater than 9 log units lower than the measured maximum likelihood. We then fit a local polynomial regression (loess in R) to those points and found the predicted maximum likelihood parameter value [10] and the 99% CI by locating the points on either side of the MLE that were 3.84 log units below the maximum likelihood.

Fixed parameter values
We fixed several parameter values based on published work and our scientific judgement. The probability that a case would eventually die, ω = 1%, is based on estimates of the case fatality ratio for both asymptomatic and symptomatic patients (95% CI 0.5-4%) [11]. The latency rate, l EI ¼ 1 3 , was initial set based on our general sense of what was consistent with the pre-print literature available at the time, however, that value is has proven to be consistent with later reports [12]. The recovery rate, l IR ¼ 1 10 , was similarly set to be consistent with the available literature when the model was being developed. Likewise, longitudinal studies have shown that our assumption of an average infectious period of 10 days is reasonably consistent with the clinical data [13,14]. Although the clinical data paints a picture of the natural history of infection that is far more complex than our model can capture, the formulation of our model is consistent with the available data.

Outcomes
Our primary outcome of interest is the growth rate, r, at the end of observation period, which is derived from the transmission rate estimated from the data. Using the equation in [15] we can express the growth rate in terms of the model parameters We found that for most countries, the fraction of the population that was still susceptible at the end of the observation period was greater than 95%, therefore we omitted the term PrðSÞ ¼ S between countries. From the same paper we also have the equation showing that R 0 = 1 when r = 0. We also compare predicted deaths due to COVID-19 in each country through July 5th 2020 to the average number of deaths in a period of the same length (Fig 3). Predicted deaths were computed by simulating the number of daily deaths from the first observation though July 5th 2020 for each country and taking the mean value. Confidence intervals were computed as the relevant quantiles of the sum over 1000 simulations. Pre-COVID-19 deaths were based on allcause death data were downloaded from the WHO mortality database (https://www.who.int/ healthinfo/mortality_data/en/) for each country for which it was available. Using data from the most recent year, we computed the death rate and multiplied the death rate by the length of the interval from the first observation to July 5th 2020 for each country.

Results
We fit our model to data collected from 55 countries. Model fits are shown in Fig 1, and parameter values are given in Table 1. The model can capture the data well, with a few exceptions. The model was not able to find a robust fit to the data from Bangladesh; in general, the upside down 'V' shape to the deaths could not be captured well in such a short time series. Algeria had a similarly odd pattern in the deaths time series that the model could not capture in detail, although the overall trend in deaths was recovered. In previous versions of this paper, the model had a hard time fitting data from both Italy and Spain. However, given the longer time series and modifications to the model form, we now find that both Italy and Spain are well-captured by the model. Overall, the model slightly overestimates the number of deaths around the time where deaths begin to decrease. However, this was generally corrected if the time series was long enough. Including temporal heterogeneity in the time from infection to detection of a COVID-19 death would likely correct this; however, it is not clear that this is advisable, as death counts are likely under-reported.
All countries except Japan and Saudi Arabia were found to have lower transmission rates on May 6, 2020 than at the beginning of the observation period, suggesting a global decline in the transmission of COVID-19 though May 6, 2020. However, the initial transmission rate should be interpreted cautiously, as we allowed a wide range of infected persons to exist 21 days before the first observation. That is, the initial transmission rate parameter is rather a convolution of the unknown number of infected persons and initial growth rate consistent with the data.
We found significant evidence for subcritical dynamics in 27 countries (3 countries had subcritical point estimates but their CIs contained the epidemic threshold). Fig 2 shows the point estimates and CIs of the final transmission rate on May 6, 2020 for all countries, stratified by continent [16]. European countries had the highest probability of being subcritical (21 of 25) with Asia (7 of 15) and the Americas (2 of 11) having fewer subcritical countries. None of the countries in Africa that met the inclusion criteria had subcritical dynamics.
Generally, countries that were found to have both variable transmission rates and variable detection probabilities (model 3 in Table 1) show a pattern of level or increasing deaths coupled with a level or slightly declining incidence in number of reported cases. This pattern illustrates how viewing the case count data alone can be potentially misleading as declines in reported cases can be confounded by variation in the probability of detection (e.g. comparing Canada to Denmark). Some countries were found to have detection probabilities lower than 1% on May 6 2020; however, these values should not be over-interpreted as the simple linear model for changing detection probabilities imposes strong assumptions that are focused on capturing the general trend. Fig 3 shows the predicted number of deaths projected out to July 5, 2020 assuming that all parameter values are constant over the period May 6 through July 5 2020. The average duration of the country-level epidemic in European countries is longer than in Asia, leading to a higher level of death, despite Asian countries having on-average higher growth rates. However, in the Americas, the predicted deaths are higher with 8 of 11 countries having total predicted deaths greater than 0.1% of the total population by July 5, 2020. The model predicts 1,320,170 total COVID-19 deaths in all 55 countries on July 5, 2020; of those deaths, 21% are predicted to occur in the US. The deaths due to COVID-19 in Europe are lower than the average number of reported deaths in a period of the same length for all countries in the data set that also had all-cause death counts from previous years. However, in the Americas, the COVID-19 death counts are approaching the all-cause death levels in several countries, suggesting that COVID-19 deaths are approaching a doubling of all-cause deaths in those countries.

Discussion
Our model found evidence for reductions in transmission rates of COVID-19 in 53 of 55 examined countries. Encouragingly, of those countries, we found statistical evidence that the size of the epidemic is decreasing in 27 countries, i.e. the effective reproductive number is less than 1, using data up to May 6, 2020. This suggests that, despite the highly heterogeneous populations represented by these countries, the growth of COVID-19 outbreak can be reverted. Although our model cannot attribute exact causes to the global decline in transmissions rates, most countries implemented sustained, population-level social distancing efforts over a period of weeks to months. These efforts are highly likely to play a major role in reducing the transmission of COVID-19 [5].
We estimated that in countries with decreasing transmission, the rate of decrease is in general less than 0.1/day (average -0.04/day). Based on data from 8 European countries, the US, and China, we previously estimated that in the absence of intervention efforts, the epidemic can grow at rates between 0.19-0.29/day [2]. This means that the outbreak can grow rapidly and quickly wipe gains made though public heath efforts made if social distancing measures are completely relaxed. For example, if the rate of decrease under strong public health interventions is 0.1/day and the growth rate in the absence of public heath interventions is 0.2/ day, then, the number of cases averted in two weeks of intervention will be regained in only one week. Social distancing measures have their own social costs. Our results suggests alternative strategies to control the virus are needed in place when social distancing efforts are relaxed. Due to the uncertainties in the impact of each specific control measures, changes to policies should be made slowly because the signal of changing transmission can take weeks to fully propagate into current data streams as a result of the long lag between infection to case confirmation (as we estimated to be on average approximately 2 weeks).
Our goal in this paper was to develop a model that could be applied very broadly to multiple countries and we have made assumptions that facilitate that goal. However, our model makes key assumptions that should be considered when thinking about where these results fit into the vast collection of COVID-19 modeling papers. For example, we slightly privilege death counts over case counts in linking the population model to the data by assuming that the distribution of the time to death is known and that the probability of being a detected death is fixed over time. Likewise, we assume that at the country level, the change in transmission rates can be modeled by a simple linear function, which we believe is reasonable as interventions implemented at the local level are likely to lead to a smooth change when aggregated up to the population level. Our model produces reasonable fits to the global data, but out approach does not allow us to have unique models for each country that could almost certainly capture country-level trends with greater accuracy. Our model also makes strong simplifying assumptions about the natural history of infection, about which we are continuously learning. For example, our model assumes that contagiousness is constant over the infection, which is possibly variable over time [12]. We also assume that diagnosis does not affect transmission from infected persons. However, given that we are inferring broad, population-average parameters and allowing those parameters to change over time to reflect broad changes in the transmission dynamics, we believe that our results are are reasonable portrayals of reality despite using a simple model of the natural history of infection.
Overall, our results suggest that COVID-19 is controllable in diverse settings using a full range of strong and comprehensive non-pharmaceutical measures, and that future deaths from the disease are avoidable.