## Figures

## Abstract

We analyzed COVID-19 data through May 6th, 2020 using a partially observed Markov process. Our method uses a hybrid deterministic and stochastic formalism that allows for time variable transmission rates and detection probabilities. The model was fit using iterated particle filtering to case count and death count time series from 55 countries. We found evidence for a shrinking epidemic in 30 of the 55 examined countries. Of those 30 countries, 27 have significant evidence for subcritical transmission rates, although the decline in new cases is relatively slow compared to the initial growth rates. Generally, the transmission rates in Europe were lower than in the Americas and Asia. This suggests that global scale social distancing efforts to slow the spread of COVID-19 are effective although they need to be strengthened in many regions and maintained in others to avoid further resurgence of COVID-19. The slow decline also suggests alternative strategies to control the virus are needed before social distancing efforts are partially relaxed.

**Citation: **Romero-Severson EO, Hengartner N, Meadors G, Ke R (2020) Change in global transmission rates of COVID-19 through May 6 2020. PLoS ONE 15(8):
e0236776.
https://doi.org/10.1371/journal.pone.0236776

**Editor: **Grant Lythe,
University of Leeds, UNITED KINGDOM

**Received: **April 23, 2020; **Accepted: **July 14, 2020; **Published: ** August 6, 2020

**Copyright: ** © 2020 Romero-Severson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **Data were downloaded from https://github.com/CSSEGISandData/COVID-19.

**Funding: **EORS, NH, RK we funded by Laboratory Directed Research and Development grant 20200002CR from Los Alamos National Laboratory. The sponsors played no role in the study design, data collection and analysis, decision to publish, and preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Since its initial outbreak in Wuhan, China in late 2019 and early 2020 [1], COVID-19 caused repeated rapid outbreaks across the global from February through April 2020. The extremely rapid spread of COVID-19 in China [2] does not appear to be an anomaly: the disease has shown a short doubling time (2.4-3.6 days) outside of China as well [3]. As of May 21, 2020, the virus caused 5,034,458 reported infections, and 328,730 deaths globally [4]. In response, most affected countries/regions have implemented strong social distancing efforts, such as school closures, working-from-home, shelter-in-place orders. As a result, the spread of COVID-19 slowed down substantially in some countries [5], leading to a flattening of the epidemic curve. As social distancing induces high costs to both society and individuals, plans to relax social distancing are discussed. However, changes in both the transmission rates and detection probabilities over time coupled with stochasticity due to reporting delays makes differentiating between truly subcritical dynamics and simply reduced transmission difficult.

In this report, we developed a deterministic-stochastic hybrid model and fitted the model to case incidence and death incidence time series data from 55 countries. Following the approach suggested by King et al. [6], we use a (partially) stochastic model and base our estimates on incidence rather than cumulative incidence data. Using both case count and death count of data allowed us to disentangle changes in surveillance intensity from changes in transmission [3]. We found evidence for large decreases in the country-level transmission rates, in several of the worst-affected countries. Importantly, using data up to May 6, 2020, we computed 99% confidence intervals to test whether or not the data were consistent with subcritical dynamics (i.e. the reproductive number *R* was below 1 on May 6, 2020). Most countries showed large decreases in transmission rates over time, and more than half of studied countries have transmission rates below the epidemic threshold. On the other hand, many countries still appear to be showing rapid exponential growth. Given its highly contagious nature, COVID-19 can spread rapidly when strong social distancing measures are lifted, even partially [3]. Alternative strategies that can effective control the virus are needed when social distancing measures are relaxed.

## Materials and methods

### Data

Case count and death data were downloaded from the Johns Hopkins GitHub repository (https://github.com/CSSEGISandData/COVID-19) through May 6, 2020. Data included aggregate counts of reported cases and deaths at the country level and contained no identifying information. Any country in the data that had more than 2000 cumulative cases and 100 deaths by May 6, 2020 was included. To minimize the effect of repatriated cases we started each time series on the first day when the cumulative number of cases exceeded 100. All data processing and model fitting were otherwise done on the incidence scale. To address obvious bulk-reporting issues in the data (e.g. sudden zeros in the data followed by very large numbers), we smoothed the data using Tukey’s 3-median method. Because several countries had days with a single death surrounded by days with no deaths, which the smoothing method would set to zero, days with a single death were not replaced with smoothed values. The original data and the smoothed data used for estimation are shown in S1 Fig in S1 File.

### Model

We model the spread of COVID-19 as a partially observed Markov process with real-valued states *S* (susceptible), *E* (exposed), *I* (infected), and *R* (removed) to describe the latent population dynamics, and integer-valued states *C*_{0} (to be counted), *Y*_{1} (counted cases), *D*_{0:3} (dying), and *Y*_{2} (counted deaths) to model sampling into the data. We use multiple states to model the counted deaths to produce an Erlang distribution with mean 21 days and standard deviation 11 days in the time to death based on previous estimates of the time to death [7]. The model and all of its parameters (Table 1) have time units of days. The latent population model is governed by the following ordinary differential equations,
where *χ*(*t*) is the time-variable transmission rate and *N* = *S* + *E* + *I* + *R* is assumed to be fixed over the run of the model. λ_{EI} is the rate at which exposed persons become infectious and λ_{IR} is the rate at which cases recover (i.e. are either no longer infectious or die). At every time interval, we sample persons moving from the *E* to *I* states into a stochastic arm of the model that is used to calculate the likelihood of the data.

The ‘Model’ column referrers to the best model defined in the model selection procedure. The values *r*_{0} and *r*_{f} are the initial and final exponential growth rates implied by the other model parameters. Note that values of *r* assume a fully susceptible population to ensure that the change in growth rates are comparable between countries. The term *I*(−21) indicates the number of infected persons 21 days prior to the first included observation. *χ*_{0}, *χ*_{f}, *r*_{0}, *r*_{f}, have day^{−1} units; *ρ*_{0} and *ρ*_{f} are probabilities. The error terms *ϵ*_{1} and *ϵ*_{2} are defined according to the definition of the overdispersion term used in the Negative Binomial distribution as implemented in R (i.e. in the limit as *ϵ* becomes large, the error becomes Poisson).

To relate the latent population model to data, we randomly sample individuals from the unobserved population into a stochastic process that models the random movement from infection to being either counted as an observed case or counted as an observed death. The number of persons sampled into the observation arm of the model over a time interval *dt* is given by *X* = (*X*_{1}, *X*_{2}, *X*_{3}, *X*_{4}) ∼ Multinomial(*g*(*E*λ_{EI} *dt*), {(*ρ*^{c}(*t*)*ω*^{c}, *ρ*^{c}(*t*)*ω*, *ρ*(*t*)*ω*^{c}, *ρ*(*t*)*ω*}) where *ρ*(*t*) and *ω* are the probabilities that an infected person will be counted or die respectively, *ρ*^{c}(*t*) and *ω*^{c} are the probabilities of being not sampled or not dying respectively, and *g*(*u*) is a stochastic function that maps from real-value *u* to integer *g*(*u*); it takes value ⌊*u*⌋ with probability *u* mod 1 and value ⌈*u*⌉ with probability 1 − (u mod 1). In plain English, the model tracks the random fate of each newly infected person as they move from the exposed to infected state with respect to eventually being observed as a case and/or a death in data. At each time step of length Δ*t*, the change in state space is given by,
where *F*_{t}|*C*_{0} is a random variate from , and *H*_{it} is a random variate from . *X*_{i} indicates the *i*^{th} element of Multinomial random variate defined above. The rate λ_{Y1} determines the rate at which persons who will be counted are counted (i.e. lower values of λ_{Y1} mean a longer delay in cases showing up in the data). The values of *Y*_{1:2} are set to zero at the beginning of every day such that they accumulate the simulated number of cases and deaths that occur each day. Both the transmission rate and detection probability are allowed to vary with time in the following way:
where *t*_{f} is the final time in the given dataset. In plain English, the transmission rate is constant up to some time, , where it linearly increases or decreases to the value *χ*_{f} by time . The model is constrained such that , that is, the transmission rate must be constant for at least 20 days before the end of the data collection period. This constraint is in place to avoid overfitting the final transmission rate. The detection rate likewise has a linear increase or decrease beginning at time *t*_{ρ}; however, the increase or decrease continues to a fixed point that is 20 days before the final datum. Variation in the detection rate (i.e. the probability that an infected case will be counted in a fixed interval) over the course of the epidemic can strongly bias estimates of the population growth rate (derivation for the exponential growth case in S1 File); not allowing the detection probability to change over time could lead to discordance in the case count and death count time series.

### Parameter estimation

We assume that the data are Negative Binomial-distributed, conditional on the simulated number of cases and deaths that occur in a given time interval, i.e. the number of cases in the *i*^{th} observation period has density NegBin(*Y*_{1}, *ϵ*_{1}) and the number of deaths in the *i*^{th} observation period has density NegBin(*Y*_{2}, *ϵ*_{2}). The Negative Binomial is parameterized such that the first argument is the expectation and the second is an inverse overdispersion parameter that controls the variance of the data about the expectation; as *ϵ*_{i} becomes large, the data model approaches a Poisson with parameter *Y*_{i}. Both *ϵ*_{1} and *ϵ*_{2} were estimated from the data.

Parameter estimation had two distinct steps: model selection and computation of confidence intervals. In the model selection phase, the model was fit to the data using an iterated particle-filtering method, implemented in the pomp R library [8]. To optimize the likelihood of the data, we used 1500 particles in 125 iterations for each country. To determine whether or not 1500 particles was sufficient to minimize the variance in the estimated the mean likelihood at a given parameter value we ran 20 independent particles filters on a single data set and found that the average deviation from the mean was less than 0.1 log units suggesting that we would be able to detect differences in the log likelihood greater than 0.1 log units. The reported likelihoods were measured using 4500 particles at the optimized Maximum Likelihood Estimates (MLEs).

For all fits, the initial state at time zero is computed by assuming there were *I*(0) infected persons, 21 days before the first reported death (by definition time one). The number of initial number of susceptibilities was assumed to be the predicted 2020 population size of the given country [9]. The model is then simulated forward for 21 days, assuming exponential growth with transmission rate *χ*_{0}, which is taken at the initial state of the model at time zero. Parameters have constraints *I*(0) ∈ [1, 1000], *ρ*_{0} ∈ [0, 1], *ρ*_{f} ∈ [0, 1], *t*_{χ} and *t*_{ρ} ∈ [0, *t*_{f} − 10]; all other parameters are constrained to be positive.

Because the data were highly variable in the complexity of the patterns they showed, we considered three nested models of increasing complexity for each country. The first model (model 1) assumed simple exponential growth with a constant sampling probability, i.e. *ρ*_{0} = *ρ*_{f} and . This amounts to a period of exponential growth with transmission rate *χ*_{0} in the pre-data period and then a constant transmission rate *χ*_{f} over the observation period. The second model (model 2) allowed the transmission rate to vary but kept the detection probability constant, i.e. *ρ*_{0} = *ρ*_{f}. The third model (model 3) allowed both the transmission rate and detection rate to vary, i.e. all parameters estimated. We determined the best model for each country by a sequence of likelihood ratio tests first comparing model 1 and model 2 and then model 2 to model 3.

Because we are using an optimization method, we do not have access to samples from the likelihood surface directly. Therefore, to obtain estimates of the parametric uncertainly in the final transmission rate, *χ*_{f}, we computed 99% confidence intervals using the profile likelihood method. For each country computed the profile likelihood by optimizing the model along a fixed grid with points every 0.1 units centered on the MLE of *χ*_{f} value from the previous fit. Points were added to the grid until the measured likelihood on either side of the MLE was greater than 9 log units lower than the measured maximum likelihood. We then fit a local polynomial regression (loess in R) to those points and found the predicted maximum likelihood parameter value [10] and the 99% CI by locating the points on either side of the MLE that were 3.84 log units below the maximum likelihood.

### Fixed parameter values

We fixed several parameter values based on published work and our scientific judgement. The probability that a case would eventually die, *ω* = 1%, is based on estimates of the case fatality ratio for both asymptomatic and symptomatic patients (95% CI 0.5-4%) [11]. The latency rate, , was initial set based on our general sense of what was consistent with the pre-print literature available at the time, however, that value is has proven to be consistent with later reports [12]. The recovery rate, , was similarly set to be consistent with the available literature when the model was being developed. Likewise, longitudinal studies have shown that our assumption of an average infectious period of 10 days is reasonably consistent with the clinical data [13, 14]. Although the clinical data paints a picture of the natural history of infection that is far more complex than our model can capture, the formulation of our model is consistent with the available data.

### Outcomes

Our primary outcome of interest is the growth rate, *r*, at the end of observation period, which is derived from the transmission rate estimated from the data. Using the equation in [15] we can express the growth rate in terms of the model parameters
We found that for most countries, the fraction of the population that was still susceptible at the end of the observation period was greater than 95%, therefore we omitted the term from the plots and tables to facilitate comparisons of changes in transmission and growth rates between countries. From the same paper we also have the equation
showing that *R*_{0} = 1 when *r* = 0.

We also compare predicted deaths due to COVID-19 in each country through July 5th 2020 to the average number of deaths in a period of the same length (Fig 3). Predicted deaths were computed by simulating the number of daily deaths from the first observation though July 5th 2020 for each country and taking the mean value. Confidence intervals were computed as the relevant quantiles of the sum over 1000 simulations. Pre-COVID-19 deaths were based on all-cause death data were downloaded from the WHO mortality database (https://www.who.int/healthinfo/mortality_data/en/) for each country for which it was available. Using data from the most recent year, we computed the death rate and multiplied the death rate by the length of the interval from the first observation to July 5th 2020 for each country.

## Results

We fit our model to data collected from 55 countries. Model fits are shown in Fig 1, and parameter values are given in Table 1. The model can capture the data well, with a few exceptions. The model was not able to find a robust fit to the data from Bangladesh; in general, the upside down ‘V’ shape to the deaths could not be captured well in such a short time series. Algeria had a similarly odd pattern in the deaths time series that the model could not capture in detail, although the overall trend in deaths was recovered. In previous versions of this paper, the model had a hard time fitting data from both Italy and Spain. However, given the longer time series and modifications to the model form, we now find that both Italy and Spain are well-captured by the model.

Smoothed data points are shown in black and the expectations from the model are shown as solid lines. Model results for cases are shown in blue colors and deaths are shown in red. The envelope shows the 95% CIs of the distribution of the data conditional on the maximum likelihood model fit for the best model for each country. The x-axis is measured in sequential days of the year 2020.

Overall, the model slightly overestimates the number of deaths around the time where deaths begin to decrease. However, this was generally corrected if the time series was long enough. Including temporal heterogeneity in the time from infection to detection of a COVID-19 death would likely correct this; however, it is not clear that this is advisable, as death counts are likely under-reported.

All countries except Japan and Saudi Arabia were found to have lower transmission rates on May 6, 2020 than at the beginning of the observation period, suggesting a global decline in the transmission of COVID-19 though May 6, 2020. However, the initial transmission rate should be interpreted cautiously, as we allowed a wide range of infected persons to exist 21 days before the first observation. That is, the initial transmission rate parameter is rather a convolution of the unknown number of infected persons and initial growth rate consistent with the data.

We found significant evidence for subcritical dynamics in 27 countries (3 countries had subcritical point estimates but their CIs contained the epidemic threshold). Fig 2 shows the point estimates and CIs of the final transmission rate on May 6, 2020 for all countries, stratified by continent [16]. European countries had the highest probability of being subcritical (21 of 25) with Asia (7 of 15) and the Americas (2 of 11) having fewer subcritical countries. None of the countries in Africa that met the inclusion criteria had subcritical dynamics.

Blue dots indicate point estimates and grey bars indicate the 99% confidence intervals. The red line delineates exponential growth from exponential decay. Countries below this line have a shrinking epidemic. Country names are indicated by ISO Alpha-3 codes.

Generally, countries that were found to have both variable transmission rates and variable detection probabilities (model 3 in Table 1) show a pattern of level or increasing deaths coupled with a level or slightly declining incidence in number of reported cases. This pattern illustrates how viewing the case count data alone can be potentially misleading as declines in reported cases can be confounded by variation in the probability of detection (e.g. comparing Canada to Denmark). Some countries were found to have detection probabilities lower than 1% on May 6 2020; however, these values should not be over-interpreted as the simple linear model for changing detection probabilities imposes strong assumptions that are focused on capturing the general trend.

Fig 3 shows the predicted number of deaths projected out to July 5, 2020 assuming that all parameter values are constant over the period May 6 through July 5 2020. The average duration of the country-level epidemic in European countries is longer than in Asia, leading to a higher level of death, despite Asian countries having on-average higher growth rates. However, in the Americas, the predicted deaths are higher with 8 of 11 countries having total predicted deaths greater than 0.1% of the total population by July 5, 2020. The model predicts 1,320,170 total COVID-19 deaths in all 55 countries on July 5, 2020; of those deaths, 21% are predicted to occur in the US. The deaths due to COVID-19 in Europe are lower than the average number of reported deaths in a period of the same length for all countries in the data set that also had all-cause death counts from previous years. However, in the Americas, the COVID-19 death counts are approaching the all-cause death levels in several countries, suggesting that COVID-19 deaths are approaching a doubling of all-cause deaths in those countries.

The distribution of the total number of predicted deaths in the observation period plus 60 days assuming final model maximum likelihood values on May 6 2020 are constant though July 5 2020. Blue dots indicate median values and grey bars show the 95% confidence interval for the predicted number of deaths. Red dots show the average number of deaths in an average period of the same length in the World Health Organization death data from the most recent year available. The y-axis is on the log10 scale. Country names are indicated by ISO Alpha-3 codes.

## Discussion

Our model found evidence for reductions in transmission rates of COVID-19 in 53 of 55 examined countries. Encouragingly, of those countries, we found statistical evidence that the size of the epidemic is decreasing in 27 countries, i.e. the effective reproductive number is less than 1, using data up to May 6, 2020. This suggests that, despite the highly heterogeneous populations represented by these countries, the growth of COVID-19 outbreak can be reverted. Although our model cannot attribute exact causes to the global decline in transmissions rates, most countries implemented sustained, population-level social distancing efforts over a period of weeks to months. These efforts are highly likely to play a major role in reducing the transmission of COVID-19 [5].

We estimated that in countries with decreasing transmission, the rate of decrease is in general less than 0.1/day (average -0.04/day). Based on data from 8 European countries, the US, and China, we previously estimated that in the absence of intervention efforts, the epidemic can grow at rates between 0.19-0.29/day [2]. This means that the outbreak can grow rapidly and quickly wipe gains made though public heath efforts made if social distancing measures are completely relaxed. For example, if the rate of decrease under strong public health interventions is 0.1/day and the growth rate in the absence of public heath interventions is 0.2/day, then, the number of cases averted in two weeks of intervention will be regained in only one week. Social distancing measures have their own social costs. Our results suggests alternative strategies to control the virus are needed in place when social distancing efforts are relaxed. Due to the uncertainties in the impact of each specific control measures, changes to policies should be made slowly because the signal of changing transmission can take weeks to fully propagate into current data streams as a result of the long lag between infection to case confirmation (as we estimated to be on average approximately 2 weeks).

Our goal in this paper was to develop a model that could be applied very broadly to multiple countries and we have made assumptions that facilitate that goal. However, our model makes key assumptions that should be considered when thinking about where these results fit into the vast collection of COVID-19 modeling papers. For example, we slightly privilege death counts over case counts in linking the population model to the data by assuming that the distribution of the time to death is known and that the probability of being a detected death is fixed over time. Likewise, we assume that at the country level, the change in transmission rates can be modeled by a simple linear function, which we believe is reasonable as interventions implemented at the local level are likely to lead to a smooth change when aggregated up to the population level. Our model produces reasonable fits to the global data, but out approach does not allow us to have unique models for each country that could almost certainly capture country-level trends with greater accuracy.

Our model also makes strong simplifying assumptions about the natural history of infection, about which we are continuously learning. For example, our model assumes that contagiousness is constant over the infection, which is possibly variable over time [12]. We also assume that diagnosis does not affect transmission from infected persons. However, given that we are inferring broad, population-average parameters and allowing those parameters to change over time to reflect broad changes in the transmission dynamics, we believe that our results are are reasonable portrayals of reality despite using a simple model of the natural history of infection.

Overall, our results suggest that COVID-19 is controllable in diverse settings using a full range of strong and comprehensive non-pharmaceutical measures, and that future deaths from the disease are avoidable.

## References

- 1. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia. New England Journal of Medicine. 2020;382(13):1199–1207. pmid:31995857
- 2. Sanche S, Lin YT, Xu C, Romero-Severson E, Hengartner N, Ke R. Early Release - High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2 - Emerging Infectious Diseases journal - CDC Volume 26, Number 7—July 2020.
- 3. Ke R, Sanche S, Romero-Severson E, Hengartner N. Fast spread of COVID-19 in Europe and the US and its implications: even modest public health goals require comprehensive intervention. medRxiv. 2020; p. 2020.04.04.20050427.
- 4. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases. 2020;0(0).
- 5.
Flaxman S, Mishra S, Gandy A. 2020-03-30-COVID19-Report-13.pdf. Imperial College London; 2020. Available from: https://www.imperial.ac.uk/media/imperial-college/medicine/mrc-gida/2020-03-30-COVID19-Report-13.pdf.
- 6. King AA, Cellès MDd, Magpantay FMG, Rohani P. Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola. Proc R Soc B. 2015;282(1806):20150347. pmid:25833863
- 7. Sanche S, Lin YT, Xu C, Romero-Severson E, Hengartner N, Ke R. High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2. Emerging Infectious Diseases. 2020;26(7). pmid:32255761
- 8.
King A, Ionides E, Bretó C, Ellner S, Kendall B, Wearing H, et al. pomp: Statistical inference for partially observed Markov processes (R package); 2009.
- 9.
Washington DC CIA. The World Factbook 2020;. Available from: https://www.cia.gov/library/publications/resources/the-world-factbook/index.html.
- 10. Ionides EL. Maximum smoothed likelihood estimation. Statistica Sinica. 2005; p. 1003–1014.
- 11.
Dorigatti I, Okell L, Cori A, Imai N, Baguelin M, Bhatia S, et al. Severity of 2019-novel coronavirus (nCoV). Imperial College London; 2020. Available from: http://spiral.imperial.ac.uk/handle/10044/1/77154.
- 12. He X, Lau EHY, Wu P, Deng X, Wang J, Hao X, et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nature Medicine. 2020;26(5):672–675. pmid:32296168
- 13. Wölfel R, Corman VM, Guggemos W, Seilmaier M, Zange S, Müller MA, et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. pmid:32235945
- 14. Kujawski SA, Wong KK, Collins JP, Epstein L, Killerby ME, Midgley CM, et al. Clinical and virologic characteristics of the first 12 patients with coronavirus disease 2019 (COVID-19) in the United States. Nature Medicine. 2020; p. 1–8.
- 15. Ma J. Estimating epidemic exponential growth rate and basic reproduction number. Infectious Disease Modelling. 2020;5:129–141. pmid:31956741
- 16. Arel-Bundock V, Enevoldsen N, Yetman C. countrycode: An R package to convert country names and country codes. Journal of Open Source Software. 2018;3(28):848.