Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimating the undetected emergence of COVID-19 in the US

  • Emily M. Javan ,

    Contributed equally to this work with: Emily M. Javan, Spencer J. Fox

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    emjavan@utexas.edu

    Affiliation Department of Integrative Biology, University of Texas at Austin, Austin, TX, United States of America

  • Spencer J. Fox ,

    Contributed equally to this work with: Emily M. Javan, Spencer J. Fox

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Integrative Biology, University of Texas at Austin, Austin, TX, United States of America, Department of Epidemiology & Biostatistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America

  • Lauren Ancel Meyers

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    Affiliations Department of Integrative Biology, University of Texas at Austin, Austin, TX, United States of America, Santa Fe Institute, Santa Fe, New Mexico, United States of America

Abstract

As SARS-CoV-2 emerged as a global threat in early 2020, China enacted rapid and strict lockdown orders to prevent introductions and suppress transmission. In contrast, the United States federal government did not enact national orders. State and local authorities were left to make rapid decisions based on limited case data and scientific information to protect their communities. To support local decision making in early 2020, we developed a model for estimating the probability of an undetected COVID-19 epidemic (epidemic risk) in each US county based on the epidemiological characteristics of the virus and the number of confirmed and suspected cases. As a retrospective analysis we included county-specific reproduction numbers and found that counties with only a single reported case by March 16, 2020 had a mean epidemic risk of 71% (95% CI: 52–83%), implying COVID-19 was already spreading widely by the first detected case. By that date, 15% of US counties covering 63% of the population had reported at least one case and had epidemic risk greater than 50%. We find that a 10% increase in model estimated epidemic risk for March 16 yields a 0.53 (95% CI: 0.49–0.58) increase in the log odds that the county reported at least two additional cases in the following week. The original epidemic risk estimates made on March 16, 2020 that assumed all counties had an effective reproduction number of 3.0 are highly correlated with our retrospective estimates (r = 0.99; p<0.001) but are less predictive of subsequent case increases (AIC difference of 93.3 and 100% weight in favor of the retrospective risk estimates). Given the low rates of testing and reporting early in the pandemic, taking action upon the detection of just one or a few cases may be prudent.

Introduction

The COVID-19 (coronavirus disease of 2019) pandemic claimed over 350,000 American lives in 2020 [1]. Early in the pandemic, when confirmed case counts were still relatively low across the US, the federal government left decision making largely to state and local public authorities. Amidst great uncertainty, leaders faced the unprecedented challenge of balancing the threat of a mostly undetected but deadly virus against the economic and societal costs of shelter-in-place and travel restrictions. At the time, most COVID-19 cases were not reported given the high proportion of mild and asymptomatic infections, limited laboratory testing capacity and strict requirements for receiving tests (e.g. travel or contact with someone from Wuhan, China) [2, 3]. The CDC estimated that only one in ten COVID-19 infections were reported during the early phase of the pandemic [4].

As the first cases of COVID-19 were reported, decision makers urgently needed to determine whether they reflected sporadic clusters stemming from recent introductions or sustained community transmission that might evolve into a large epidemic. In the southern US, the 2016 expansion of Zika Virus (ZIKV) across the Americas posed a similar challenge. Cryptic transmission meant that by the time a few cases were reported, a large epidemic could already be underway [5]. Here, we describe a stochastic susceptible-exposed-infected-recovered (SEIR) compartmental model framework for estimating the probability of an on-going, undetected epidemic (epidemic risk) from scarce case data. In this study, we use the term epidemic to refer to the county-level reproduction number of SARS-CoV-2 being greater than one. This is the threshold between self-sustained epidemic growth versus stuttering chains of transmission [6].

The approach was originally developed to support situational awareness for ZIKV and adapted for COVID-19. We apply it to estimating the risk of undetected COVID-19 epidemics in US counties during the emergence phase of the pandemic in early 2020. We present results from a model using the best estimates for COVID-19 epidemiological characteristics as of December 2022 (retrospective) and compare those results with those made in early March 2020 (original).

Results

We modeled the stochastic emergence of COVID-19 accounting for county-specific transmission risks, potential superspreading events, asymptomatic infections, and disease-specific epidemiological characteristics (Table 1). We assumed county-specific transmission rates ranging from Re of 1.4 to 4.4 with a median of 2.8 as estimated in [7]. Based on the underlying Re and a 10% case detection rate [4], the chance that a county had an underlying COVID-19 epidemic (epidemic risk) was 7–28% with no detected cases, 51–85% upon the detection of a single case, and over 99% by the time 11 or more cases were detected under scenarios without non-pharmaceutical interventions in place (Fig 1). For example, Travis County, TX (the primary county representing the city of Austin, TX) was estimated to have an Re of 2.0 [7]. Our model estimates a 95% epidemic risk for Travis County based on the four cumulative cases reported by March 13, 2020, which increases to 99% on March 20, 2020 when there were twenty-one cumulative reported cases (Fig 1).

thumbnail
Fig 1. Epidemic risk for the effective reproduction numbers (Re) corresponding to reduced risk (1.1) and the minimum (1.4), median (2.8), and maximum (4.4) estimated across all US counties.

For a given number of reported cases, epidemic risk increased with estimated Re. Epidemic risk is the percent of 100,000 simulations, for each Re, that become epidemics. We classified a simulation as an epidemic if it reached 2,000 cumulative infections and had a minimum prevalence of 50 new infections per day. By the time a single case was reported, there was a 13%, 45%, 81%, or 89% chance of an ongoing epidemic for an Re of 1.1 (reduced risk), 1.4 (minimum), 2.8 (median), or 4.4 (maximum), respectively. County-specific risk was estimated from these curves. For example, Travis County, TX (red lines) had an Re of 2.0, which corresponds to an epidemic risk of 95% on March 13, 2020 and 99% on March 20, 2020 based on cumulative reported case counts of four and twenty-one on those dates, respectively. If the Re was instead estimated as 1.1 in Travis County, then the estimated risk would decrease to 57% based on the twenty-one cases reported on March 20. The model assumed a 10% case detection rate, generation time of 6.0 days, a latent period of 2.9 days, and infectious period of 6.2 days (Table 1 - retrospective).

https://doi.org/10.1371/journal.pone.0284025.g001

thumbnail
Table 1. Model parameters used for simulating COVID-19 outbreaks.

https://doi.org/10.1371/journal.pone.0284025.t001

By March 16, 2020, counties reported between 0 and 489 cumulative cases totaling 4,009 nationally [16]. We estimate a national mean epidemic risk of 25% (95% CI: 11–99%) on that day and that epidemic risk exceeded 50% for roughly 15% of the 3,142 counties representing 63% of the US population (Fig 2A). By April 13, 2020, total reported cases in the US climbed to 467,158 and we estimate a mean epidemic risk of 82% (95% CI: 12–100%) with an estimated 85% of counties representing 96% of the US population having over a 50% epidemic risk (Fig 2B). Projected risks are generally higher for both larger transmission rates (Fig 1) and lower assumed case detection rates (S1 Fig). For the median Re = 2.8, the expected time between the first COVID-19 case report and the epidemic reaching 1,000 cumulative infections was 3.4 (95% CI 2.0–7.3) weeks. Waiting to act until the tenth reported case would shrink the time until 1,000 cumulative infections by 55% to 1.5 (95% CI 1.0–2.9) weeks (Fig 3).

thumbnail
Fig 2.

Estimated COVID-19 epidemic risk in 3,142 US counties as of March 16 (A) and April 13, 2020 (B). Epidemic risk was determined for each county based on its effective reproduction number (Re) as estimated in [7], alongside the number of reported cases in the county on the specific date as described in Fig 1. Epidemic risk is the percent of 100,000 simulations for the county that become epidemics. We classified a simulation as an epidemic if it reached 2,000 cumulative infections and had a minimum prevalence of 50 new infections per day [5]. County-specific Re ranged from 1.4 to 4.4 with a median of 2.8. The model assumed a 10% case detection rate, generation time of 6.0 days, a latent period of 2.9 days, and infectious period of 6.2 days (Table 1 - retrospective).

https://doi.org/10.1371/journal.pone.0284025.g002

thumbnail
Fig 3. Expected time until the local epidemic exceeds 1,000 cumulative infections in a county, assuming Re = 2.8, a 10% case detection rate, and generation time of 6.0 days.

For a given number of cumulative reported cases (x-axis), we assume an epidemic is underway then estimate the median and 95% CI (error bars) number of weeks until the cumulative infections reach or exceed 1,000. When the first case is reported, we expect cumulative infections to surpass 1,000 in 3.4 (95% CI 2.0–7.3) weeks; when the 10th case is reported, the expected lead time shrinks to 1.5 (95% CI 1.0–2.9) weeks. The estimates are based on 100,000 stochastic simulations of the retrospective model (Table 1).

https://doi.org/10.1371/journal.pone.0284025.g003

To validate our model, we compare the estimated epidemic risk on March 16, 2020 to increases in reported case counts in the following week (Fig 4A). We find that our estimates of epidemic risk correlate significantly with the probability that a county reported additional cases in the subsequent week (logistic regression, p<0.001). A 10% increase in estimated risk corresponds to an increase in the log odds of a county detecting at least one, two, or five new cases of 0.55 (95% CI 0.49–0.61), 0.53 (95% CI 0.49–0.58), and 0.57 (95% CI 0.53–0.61), respectively (Fig 4B and S2 Fig).

thumbnail
Fig 4. Comparison of estimated epidemic risks and reported increases in cases at the county level between March 16 and March 23, 2020.

(A) Proportion of all US counties that had specified one-week increase in reported COVID-19 cases, compared to the cumulative case count in the county as of March 16, 2020 (x-axis) [16]. The light, medium and dark gray lines correspond to increases of at least one, two, or five new reported cases within one week, respectively. The red ribbon indicates model estimates for the probability that an epidemic is underway, depending on the cumulative reported cases. The bottom and top of the ribbon correspond to estimates for the lowest and highest risk counties across the United States, where risk is estimated based on county-specific estimates of Re and the cumulative number of reported cases on March 16. These estimates are calculated based on 100,000 simulations for each reproduction number (Re = 1.4 to 4.4 by 0.1), assuming a 10% case detection rate and a generation time of 6.0 days. (B) Estimates of epidemic risks on March 16 correlate with case count increases in the subsequent week across all counties. Points indicate whether counties reported at least two new COVID-19 cases between March 16 and March 23, where the bottom and top of the graph correspond to counties that did or did report such increases. The line and shading indicate the estimated mean (line) and 95% confidence interval (ribbon) resulting from a logistic regression relating actual one-week reported increase to estimated risk on March 16, 2020. We estimate that a 10% increase in estimated risk corresponds to a 0.53 (95% CI: 0.49–0.58) increase in the log odds that the county reported at least two additional cases in the following week.

https://doi.org/10.1371/journal.pone.0284025.g004

As additional validation of the modeling framework, we compare the estimates originally made in March 2020, before we had county-specific estimates of reproduction numbers (Table 1, Fig 5). At that time, we assumed all counties had the same effective reproduction numbers, ranging from 1.1 to 3.0; we also originally assumed a latent period of 1.25 rather than 2.9 days. Our original estimates assuming Re = 3.0 most closely match the retrospective estimates (Pearson’s product-moment correlation, r = 0.99; p<0.001). Our original county-level risk maps (S3S5 Figs) and estimates for the time until counties will reach 1,000 cumulative infections (S6 and S7 Figs) are also consistent with our retrospective analysis. Finally, our original estimates reliably predicted subsequent county case increases (S9 Fig). For example, assuming Re = 3.0, a 10% increase in estimated epidemic risk corresponds to an increase in the log odds of a county detecting at least one, two, or five new cases by March 23 of 0.48 (95% CI: 0.43–0.53), 0.49 (95% CI: 0.45–0.53), and 0.55 (95% CI: 0.51–0.59), respectively. Comparing logistic regression models built on the retrospective analysis to the original epidemic risk estimates where Re = 3.0, we find that the retrospective risk estimates more accurately predict the probability of a county reporting at least two new reported cases in the week following March 16, 2020 (AIC difference of 93.3 and 100% weight in favor of the retrospective risk estimates).

thumbnail
Fig 5. Comparison of original epidemic risk estimates, assuming a uniform Re across counties, and retrospective estimates, assuming empirical county-level estimates of Re on March 16, 2020 across 3,142 US counties.

Each point corresponds to a pair of risk estimates (original on x-axis vs retrospective on y-axis) for a single county. Points are shaded according to the assumed effective reproduction number for the original estimate. The solid diagonal line indicates matching estimates.

https://doi.org/10.1371/journal.pone.0284025.g005

Discussion

The timing and rate of COVID-19 emergence varied widely across the US [17]. The earliest of the 3,142 US counties to report a case was Snohomish, Washington on January 21, 2020. By the first of March, April and May, 1%, 70% and 90% of all counties had reported at least one case, respectively. Using county-specific transmission risk estimates (Re) and a 10% case detection rate, we estimate that by the time a county reported its first case it had at least a 50% chance of a growing epidemic. On March 16, 2020 mean risk was 25% (95% CI: 11–99%), and by April 13, 2020 risk exceeded 90% in 67% of counties containing 94% of the US population.

Our framework was developed in the first months of the COVID-19 pandemic to provide local situational awareness for both government officials and the public, at a time when the data were highly uncertain. On April 3, 2020, the New York Times published our first set of estimates as a national risk map appearing on the front page [18], which reached an estimated 694 million unique viewers according to Meltwater [19]. The map estimated that 70% of 3,142 US counties containing 94% of the US population had reported at least one COVID-19 case, resulting in over a 50% chance of having an epidemic (epidemic risk). We thus believe that our estimates may have helped communities understand the silent but rapid expansion of the virus. By that date, ten states (Alabama, Arkansas, Iowa, Missouri, Nebraska, North Dakota, South Carolina, South Dakota, Utah, and Wyoming) had not yet enacted statewide stay-at-home orders [20]. While our estimates may not have directly swayed state policy makers, they provided evidence in support of strong mitigation despite low reported case counts in many areas that took action.

As validation, we compared our epidemic risk estimates to the proportion of counties that reported additional cases between March 16 and March 23, 2020 (Fig 4 and S8 Fig). We found that our epidemic risk estimates significantly correlate with subsequent case increases. We limit our historical comparison to the period prior to March 23, 2020, after which unprecedented COVID-19 lock downs and social distancing policies decreased epidemic risks [13, 2124]. We also compare the results of our analysis with estimates that we originally made on March 16, 2020, before we had data that allowed us to estimate county-specific SARS-CoV-2 reproduction numbers. At that time, we made the simplifying assumption that transmission rates were uniform across counties. Our original estimates are consistent with both our retrospective estimates, though slightly less accurate (Fig 5). Importantly, even the data limited analyses provided clear indication of the extent of undetected epidemic risk and urgency of action across the US.

Our results are consistent with the current understanding of early COVID-19 transmission in the United States. Epidemiological and phylodynamic models identified substantial, undocumented, COVID-19 transmission leading up to stay-at-home orders in late March 2020 [14, 25], with non-pharmaceutical interventions reducing transmission and preventing infections and mortality [24, 26]. Proactive responses to COVID-19 have been estimated to shorten the duration of costly measures [27, 28], whereas delays have likely cost lives [26, 29]. Thus, our results suggest that the first reported case should trigger action if the goal is to fully contain an emerging outbreak as quickly as possible. The risk of an ongoing epidemic likely already exceeds 50% and delaying action may substantially reduce the window for corrective action and amassing adequate healthcare and other mitigation resources (Fig 3).

Our analyses make several key assumptions. First, case detection rates may vary geographically and change through time depending on testing availability and regulations. Our assumption of 10% is based on a CDC seroprevalence study, which reported that rates ranged from 4% to 16% across ten sites [4]. Second, we modeled superspreading events based on estimates for SARS-CoV in Singapore in 2003 [15], which are consistent with more recent reports for SARS-CoV-2 [3032]. Third, our estimates do not account for repeated importations of infected individuals, all simulations start with only one infected individual. Multiple importations would reduce our estimated levels of epidemic risk since reported cases could reflect independent clusters rather than continuous chains of transmission. Finally, we considered scenarios with lower effective reproduction numbers than estimated (Fig 1 and S3S7 Figs), which may be more appropriate for the epidemic risks following the enactment of non-pharmaceutical interventions, but we did not account for changes to the effective reproduction number over time. While transmission can vary temporally depending on local policies, testing efforts [33, 34], and behavior [13], we made these simplifying assumption, because our overall goal was to estimate county-level epidemic risk in the absence of interventions.

While simple, this modeling framework provided useful insight during a highly uncertain time during the US Zika Virus epidemic in 2016 [5]. Now, we have shown how it can be quickly adapted to provide rapid situational awareness for COVID-19 in real-time, with different epidemiological characteristics. Similar results between our original and retrospective analyses further validate the robustness and reliability of the original epidemiological risk estimates and provide the framework for implementation during future emerging infectious outbreaks. Overall, we find that for silently spreading pathogens, proactive control measures may be prudent, even before the threat becomes apparent [35].

Methods

Data

We obtained daily county-level estimates of confirmed and suspected COVID-19 cases from a data repository curated by the New York Times [16]. The US county map was based on TIGER/Line shapefiles provided by the US Census Bureau [36] and accessed through ‘tidycensus’ version 1.2.3 for the year 2019 [37]. Estimates of each county’s 2019 population from the US Census Bureau [38] were used only to estimate the proportion of the population likely experiencing an epidemic (epidemic risk greater than 50%). For the original analysis, epidemic risk is based on a county’s cumulative reported COVID-19 cases and the effective reproduction number (Re) which is assumed to be the same for all counties. Our baseline scenario assumed the Re of SARS-CoV-2 was 1.5, accounting for ongoing social distancing measures across the US by mid-April, 2020 [39] and that 10% of all cases being reported [4]. Parameter estimates for the original analysis were taken from the literature available by mid-March, 2020 when much about SARS-CoV-2 was still unknown (Table 1).

For the retrospective analysis, epidemic risk is based on the county-specific cumulative reported cases and county-specific effective reproduction numbers (Re). Epidemiological parameters for the model are drawn from a literature search carried out in December 2022, which updated the best estimate for the COVID-19 latent period from 1.25 to 2.9 days (see comparison in Table 1). We assume that the county Re equals the basic reproduction number estimated in [7] for all counties in the contiguous US. As population density and urban-rural classification are strong predictors for the COVID-19 reproduction number [9, 40], we estimated the Re for counties in Alaska and Hawaii as the mean Re of all the contiguous US counties with the same urban-rural designation code as defined by 2013 estimates from the National Center for Health Statistics Urban-Rural Classification Scheme for Counties rounded to the nearest tenth [10]. In total, counties had twenty-nine different Re values ranging from 1.4 to 4.4. We included Re = 1.1 as well to simulate a possible social distancing scenario when counties were under shelter-in-place orders [26].

Model

We adapted the framework of another silent spreader–Zika Virus–which threatened to emerge in southern US states in 2016 [5] to model COVID-19 spread in US counties. The discrete-time SEIR model assumed a branching process for early transmission in which the number of secondary infections per infected case was distributed according to a negative binomial distribution to capture occasional superspreading events, as estimated for SARS-CoV outbreaks in 2003 [15]. Similar to the methods in [5], the exposure and infectious periods consisted of “boxcars”, smaller consecutive compartments that each individual must pass through. Boxcars enforce the minimum number of days spent in each compartment and more accurately reflect the waiting time distribution of a negative binomial distribution [41, 42]. For example, an infectious period of 9.5 days could be modeled as one compartment with a daily transition rate of 1/9.5 or broken up into seven boxcars with a daily transition rate of 7/9.5.

We account for imperfect detection and COVID-19 specific epidemiological characteristics for both original and retrospective scenarios (Table 1). We did not explicitly model asymptomatic or pre-symptomatic transmission and thus maintained a low detection probability for all infections in both scenarios. To assess the impact of parameter assumptions on our estimates of epidemic risk alongside the impact of behavioral and policy changes that might have altered the effective reproduction number, we conducted a sensitivity analysis that varied Re from 1.1 to 3.0 (S3S5 Figs) and detection rates from 5% to 40% (S1 Fig).

Our goal was to estimate the probability that an outbreak in a region would become an epidemic based on the number of observed reported cases in the region and assuming no behavioral changes or public health interventions. As such, we ran 100,000 stochastic outbreak simulations per scenario (Re held constant) beginning with a single undetected case and ending when cumulative infections reached 2,000 or the outbreak died out (whichever came first). Because we modeled transmission as a branching process, the susceptible population did not deplete as in other compartmental SEIR models. Following the methodology of [5], simulated outbreaks that reached 2,000 cumulative infections and had a minimum prevalence of 50 new infections per day were classified as epidemics. Epidemic criteria were chosen conservatively to give self-limiting outbreaks sufficient time to die out and be differentiated from self-sustaining transmission chains (S10 Fig). If simulations were terminated too soon, then some self-limiting transmission chains may reach the maximum cumulative infections by chance. We calculated epidemic risk for a given number of detected cases, x, by looking at all outbreak simulations that had x reported cases and calculating the proportion of those outbreaks that progressed to epidemics. For example, if 40,000 simulations satisfied the epidemic criteria, then the risk of an epidemic was 40%. For simulations that became epidemics (satisfied above epidemic criteria), we calculated the distribution of lags (in weeks) between the day the xth case was reported and the day the epidemic surpassed 1,000 cumulative infections (Fig 3 and S7 Fig). Confidence intervals were calculated with the quantile function in R version 3.6.1 for original and retrospective scenarios [43].

County epidemic risk assignment

We matched county cumulative reported cases (confirmed and suspected from [16]) with epidemic risk for each US county based on simulated cumulative reported cases (original) and matching the county-specific Re (retrospective). For example, if a county reported ten cumulative cases on March 16, 2020 and had an estimated Re of 2.0, then epidemic risk is assigned as a look-up of simulations which detected ten cumulative cases under a constant Re of 2.0. The retrospective analysis used county-specific estimated Re as described above and cumulative cases. In the original analysis presented in the supplement, all counties have the same effective reproduction number of 1.1, 1.5, or 3.0, and only vary due to cumulative reported cases.

Model validation

To validate the estimates of epidemic risk, we used the county-specific estimated epidemic risk on March 16, 2020 (x-axis) as a predictor for if US counties reported at least one, two, or five new cases over the week of March 16 to 23, 2020 (y-axis) in a logistic regression model. First, we calculated county-specific epidemic risk on March 16 as described above. Second, March 23rd case counts were subtracted from those on March 16 and the difference was classified as an increase of at least one, two, or five cases (three separate binary classifications). Finally, a logistic regression was fit to each classification independently to determine if the number of cases on March 16 was a significant predictor of new reported cases one week later. We compared case counts from Monday to Monday to avoid weekend reporting bias, and this week in mid-March was before most lockdowns took place in the US and saw only a moderate increase in daily tests nationally (from 20,000 to 60,000) [44]. We estimate logistic regression models based on the retrospective analysis with county-specific transmission rates (Fig 4B and S2 Fig) and for the original analysis across all assumed effective reproduction numbers (S9 Fig). We compare the two analyses through the Pearson’s product-moment correlation in estimated epidemic risks (Fig 5), the estimated logistic regression coefficients, and the model AIC and weights as calculated in the ‘bbmle’ R package version 1.0.25 [45]. As a secondary qualitative validation, we group counties by their cumulative reported case counts up to March 16, 2020 and estimate the proportion of those counties that reported one, two, or five new cases in the subsequent week. We then compare that estimated proportion with the range of epidemic risks estimated across those counties from March 16 (Fig 4A and S8 Fig).

Supporting information

S1 Fig. Sensitivity analysis of original March 16, 2020 risk estimates (assuming constant Re) with respect to the assumed reproduction number (Re) and case detection probability.

Percentage of US counties (left) or US population living in counties (right) that have greater than a 50% risk for sustained local transmission across varying assumed transmission rates (shade) and case detection probabilities (x-axis).

https://doi.org/10.1371/journal.pone.0284025.s001

(TIF)

S2 Fig. Comparison of estimated epidemic risks (based on data available as of March 16, 2020) and reported increases in cases at the county level between March 16 and March 23, 2020.

Points indicate the binary outcome for each county of whether it reported at least one (A) or five (B) new COVID-19 cases between March 16 and March 23. The bottom and top of the graph correspond to counties that did or did report such increases. The line and shading indicate the estimated mean (line) and 95% confidence interval (ribbon) resulting from a logistic regression relating actual one-week reported increase to estimated risk on March 16, 2020. We estimate that a 10% increase in model estimated epidemic risk for March 16 yields a 0.55 (95% CI 0.49–0.61) and 0.57 (95% CI 0.53–0.61) increase in the log odds that the county reported at least one or five additional cases in the following week, respectively.

https://doi.org/10.1371/journal.pone.0284025.s002

(TIF)

S3 Fig.

Original county-level estimates of ongoing COVID-19 epidemics assuming Re = 1.5 for (A) March 16, 2020 and (B) April 13, 2020. Estimated epidemic risk increased from 9% for zero cases to 50% when one case was detected and 100% for twenty-five or more cases. (A) By March 16, 2020, epidemic risk exceeded 50% in roughly 15% of the 3,142 counties covering 63% of the US population. (B) By April 13, 2020, we estimated that over 85% of US counties comprising 96% of the national population had at least a 50% chance of having an epidemic already underway. The estimates assume a 10% case detection rate and generation time of 6.0 days.

https://doi.org/10.1371/journal.pone.0284025.s003

(TIF)

S4 Fig.

Original county-level estimates of ongoing COVID-19 epidemics assuming Re = 1.1 for (A) March 16, 2020 and (B) April 13, 2020. Epidemic risk increased from 2% for zero cases to 13% when one case was detected. An Re of 1.1 may be appropriate for counties with strict social distancing measures. The model assumes the original parameter estimates, including a 10% case detection rate and generation time of 6.0 days.

https://doi.org/10.1371/journal.pone.0284025.s004

(TIF)

S5 Fig.

Original county-level estimates of ongoing COVID-19 epidemics assuming Re = 3.0 for (A) March 16, 2020 and (B) April 13, 2020. Epidemic risk increased from 22% for zero cases to 83% when one case was detected. The model also assumes the original parameter estimates, including a 10% case detection rate and generation time of 6.0 days.

https://doi.org/10.1371/journal.pone.0284025.s005

(TIF)

S6 Fig. Sensitivity analysis of original estimates (assuming uniform Re across counties) with respect to the effective reproduction number (Re).

For a given number of reported cases, the estimated risk of an epidemic increased with Re. By the time a single case is reported, there is a 13%, 50% or 83% chance of an ongoing epidemic for an Re of 1.1, 1.5 or 3.0, respectively.

https://doi.org/10.1371/journal.pone.0284025.s006

(TIF)

S7 Fig.

Expected time until epidemic exceeds 1,000 cumulative infections in a county, assuming (A) Re = 1.5 and (B) 3.0, a 10% case detection rate, and generation time of 6.0 days. For a given number of cumulative reported cases (x-axis), we estimate the median and 95% CI (error bars) number of weeks until the cumulative infections reach or exceed 1,000 for simulations classified as epidemics. (A) For Re = 1.5, when the first case is reported cumulative infections surpass 1,000 in 7.5 (95% CI 3.9–16.3) weeks; when the 10th case is reported, the expected lead time shrinks to 4.4 (95% CI 2.1–11.4) weeks. (B) Increasing Re to 3.0 there is a lag time of 3.0 (95% CI 1.6–6.6) weeks for the first case that decreases to 1.1 (95% CI 0.6–2.4) by the tenth case. Negative estimates suggest that 1,000 infections are reached prior to reporting a certain number of cumulative cases. The estimates are based on 100,000 stochastic simulations per Re.

https://doi.org/10.1371/journal.pone.0284025.s007

(TIF)

S8 Fig. Comparison between original estimates of epidemic risk on March 16, 2020 (assuming constant Re across counties) and observed percent of US counties in which COVID-19 case counts increased from March 16 to March 23, 2020, as a function of cumulative reported cases as of March 16, 2020.

The light, medium and dark gray lines correspond to increases of at least one, two, or five new reported cases within one week, respectively. The red ribbon indicates the original model estimated epidemic risk, given the cumulative reported cases on March 16, 2020 indicated on the x-axis. The bottom and top of the ribbon correspond to estimates assuming Re = 1.5 and Re = 3.0, respectively. These estimates are calculated based on 100,000 simulations for each reproduction number, assuming a 10% case detection rate and a generation time of 6.0 days. The odds of a county detecting at least five new cases increased by 4.90 (95% CI 4.14–5.99) for every one unit increase in cases on March 16. For example, a county reporting only one case as of March 16 was roughly five times more likely to report at least six new cases a week later than a county with no previously reported cases.

https://doi.org/10.1371/journal.pone.0284025.s008

(TIF)

S9 Fig. Correlations between original epidemic risk estimates for March 16, 2020 (assuming uniform Re) and case count increases in the subsequent week across all counties.

Points indicate the binary outcome for each county of whether it reported at least one, two, or five (rows) new COVID-19 cases between March 16 and March 23 under different assumed effective reproduction numbers (columns). Lines and ribbons indicate the estimated means and 95% confidence intervals for the fitted logistic regression models.

https://doi.org/10.1371/journal.pone.0284025.s009

(TIF)

S10 Fig. Outbreak simulation length determined by Re close to one with both original and retrospective parameters.

If simulations end too early, then more than expected can reach sufficient cumulative infections by random chance alone and not reflect true epidemics. By ending simulations at 2,000 cumulative infections we could confidently separate simulations with Re just below one (0.95, top row) from the epidemics of those with Re just above one (1.05, bottom row). For Re = 0.95 0.0% of original and 0.003% of retrospective simulations reached 2,000 cumulative infections and met a minimum prevalence of 50 new infections. As Re increased to 1.05, 0.11% of original and 0.10% of retrospective simulations were classified as epidemics. Ending simulations at 2,000 cumulative infections and requiring a minimum prevalence of 50 new infections per day is sufficient to distinguish self-limiting simulations from self-sustaining.

https://doi.org/10.1371/journal.pone.0284025.s010

(TIF)

References

  1. 1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20: 533–534. pmid:32087114
  2. 2. Ansari FM, Aggarwal K, Chopra A, Agrawal MG, Soni P, Agarwal P, et al. Asymptomatic coronavirus: a boon or bane? J Adv Med Dent Sci Res. 2020;8: 109–111.
  3. 3. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science (80-). 2020;368: 489–493. pmid:32179701
  4. 4. Havers FP, Reed C, Lim T, Montgomery JM, Klena JD, Hall AJ, et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Intern Med. 2020;180: 1776–1786. pmid:32692365
  5. 5. Castro LA, Fox SJ, Chen X, Liu K, Bellan SE, Dimitrov NB, et al. Assessing real-time Zika risk in the United States. BMC Infect Dis. 2017;17: 1–9. pmid:28468671
  6. 6. Blumberg S, Lloyd-Smith JO. Inference of R0 and transmission heterogeneity from the size distribution of ttuttering chains. PLoS Comput Biol. 2013;9: 1–17. pmid:23658504
  7. 7. Ives AR, Bozzuto C. Estimating and explaining the spread of COVID-19 at the county level in the USA. Commun Biol. 2021;4. pmid:33402722
  8. 8. Shim E, Tariq A, Choi W, Lee Y, Chowell G. Transmission potential and severity of COVID-19 in South Korea. Int J Infect Dis. 2020;93: 339–344. pmid:32198088
  9. 9. Sy KTL, White LF, Nichols BE. Population density and basic reproductive number of COVID-19 across United States counties. PLoS One. 2021;16: 1–11. pmid:33882054
  10. 10. National Center for Health Statistics. 2013 NCHS urban-rural classification scheme for counties. 2017. Available: https://www.cdc.gov/nchs/data_access/urban_rural.htm#Data_Files_and_Documentation
  11. 11. He X, Lau EHY, Wu P, Deng X, Wang J, Hao X, et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med. 2020;26: 672–675. pmid:32296168
  12. 12. Hart WS, Abbott S, Endo A, Hellewell J, Miller E, Andrews N, et al. Inference of the SARS-CoV-2 generation time using UK household data. Elife. 2022;11: 1–30. pmid:35138250
  13. 13. Fox SJ, Lachmann M, Tec M, Pasco R, Woody S, Du Z, et al. Real-time pandemic surveillance using hospital admissions and mobility data. Proc Natl Acad Sci U S A. 2022;119. pmid:35105729
  14. 14. Perkins TA, Cavany SM, Moore SM, Oidtman RJ, Lerch A, Poterek M. Estimating unobserved SARS-CoV-2 infections in the United States. Proc Natl Acad Sci U S A. 2020;117: 22597–22602. pmid:32826332
  15. 15. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438: 355–359. pmid:16292310
  16. 16. The New York Times. Coronavirus (Covid-19) data in the United States. 2021 [cited 15 Dec 2022]. Available: https://github.com/nytimes/covid-19-data
  17. 17. Zeller M, Gangavarapu K, Anderson C, Smither AR, Vanchiere JA, Rose R, et al. Emergence of an early SARS-CoV-2 epidemic in the United States. Cell. 2021;184: 4939–4952.e15. pmid:34508652
  18. 18. Glanz J, Bloch M, Singhvi A. Does my county have an epidemic? Estimates show hidden transmission. The New York Times. 2020. Available: https://www.nytimes.com/interactive/2020/04/03/us/coronavirus-county-epidemics.html
  19. 19. Social Listening. In: Meltwater [Internet]. 2022 [cited 15 Dec 2022]. Available: https://www.meltwater.com/en/products/social-media-monitoring
  20. 20. Zeleny J. Why these 8 Republican governors are holding out on statewide stay-at-home orders. In: CNN [Internet]. 2020 [cited 15 Dec 2022]. Available: https://www.cnn.com/2020/04/04/politics/republican-governors-stay-at-home-orders-coronavirus/index.html
  21. 21. Badr HS, Du H, Marshall M, Dong E, Squire MM, Gardner LM. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. Lancet Infect Dis. 2020;20: 1247–1254. pmid:32621869
  22. 22. Xiong C, Hu S, Yang M, Luo W, Zhang L. Mobile device data reveal the dynamics in a positive relationship between human mobility and COVID-19 infections. Proc Natl Acad Sci U S A. 2020;117: 27087–27089. pmid:33060300
  23. 23. Moreland A, Herlihy C, Tynan MA, Sunshine G, McCord RF, Hilton C, et al. Timing of state and territorial COVID-19 stay-at-home orders and changes in population movement—United States, March 1–May 31, 2020. Morb Mortal Wkly Rep. 2020;69: 1198–1203. pmid:32881851
  24. 24. Audirac M, Tec M, Meyers LA, Fox S, Zigler C. Impact of the timing of stay-at-home orders and mobility reductions on first-wave COVID-19 deaths in US counties. Am J Epidemiol. 2022;191: 900–907. pmid:35136914
  25. 25. Fauver JR, Petrone ME, Hodcroft EB, Shioda K, Ehrlich HY, Watts AG, et al. Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States. Cell. 2020;181: 990–996.e5. pmid:32386545
  26. 26. Pei S, Kandula S, Shaman J. Differential effects of intervention timing on COVID-19 spread in the United States. Sci Adv. 2020;6: 1–10. pmid:33158911
  27. 27. Du Z, Xu X, Wang L, Fox SJ, Cowling BJ, Galvani AP, et al. Effects of proactive social distancing on COVID-19 outbreaks in 58 cities, China. Emerg Infect Dis. 2020;26: 2267–2269. pmid:32516108
  28. 28. Lyu W, Wehby GL. Community use of face masks and COVID-19: evidence from a natural experiment of state mandates in the US. Health Aff. 2020;39: 1419–1425. pmid:32543923
  29. 29. Ragonnet-Cronin M, Boyd O, Geidelberg L, Jorgensen D, Nascimento FF, Siveroni I, et al. Genetic evidence for the association between COVID-19 epidemic severity and timing of non-pharmaceutical interventions. Nat Commun. 2021;12: 1–7. pmid:33846321
  30. 30. Adam DC, Wu P, Wong JY, Lau EHY, Tsang TK, Cauchemez S, et al. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nat Med. 2020;26: 1714–1719. pmid:32943787
  31. 31. Zhang Y, Li Y, Wang L, Li M, Zhou X. Evaluating transmission heterogeneity and super-spreading event of COVID-19 in a metropolis of China. Int J Environ Res Public Health. 2020;17. pmid:32456346
  32. 32. Endo A, Abbott S, Kucharski AJ, Funk S. Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China. Wellcome Open Res. 2020;5: 67. pmid:32685698
  33. 33. Pitzer VE, Chitwood M, Havumaki J, Menzies NA, Perniciaro S, Warren JL, et al. The Impact of changes in diagnostic testing practices on estimates of COVID-19 transmission in the United States. Am J Epidemiol. 2021;190: 1908–1917. pmid:33831148
  34. 34. Larremore DB, Wilder B, Lester E, Shehata S, Burke JM, Hay JA, et al. Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening. Sci Adv. 2021;7: 1–11. pmid:33219112
  35. 35. Cowling BJ, Ali ST, Ng TWY, Tsang TK, Li JCM, Fong MW, et al. Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: An observational study. Lancet Public Heal. 2020;5: e279–e288. Available: https://doi.org/10.1016/ S2468-2667(20)30090-6
  36. 36. United States Census Bureau. TIGER/Line files technical documentation. 2019. Available: https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/tiger-geo-line.2019.html
  37. 37. Walker K, Herman M. tidycensus: load US census boundary and attribute data as “tidyverse” and ‘sf’-ready data frames. 2022. Available: https://cran.r-project.org/package=tidycensus
  38. 38. United States Census Bureau. County population totals: 2010–2019. 2019 [cited 15 Dec 2022]. Available: https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html
  39. 39. Koo JR, Cook AR, Park M, Sun Y, Sun H, Lim JT, et al. Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study. Lancet Infect Dis. 2020;20: 678–688. Available: pmid:32213332
  40. 40. Smith TP, Flaxman S, Gallinat AS, Kinosian SP, Stemkovski M, Juliette H, et al. Temperature and population density influence SARS-CoV-2 transmission in the absence of nonpharmaceutical interventions. Proc Natl Acad Sci U S A. 2021;118. pmid:34103391
  41. 41. Getz WM, Dougherty ER. Discrete stochastic analogs of Erlang epidemic models. J Biol Dyn. 2018;12: 16–38. pmid:29157162
  42. 42. Wearing HJ, Rohani P, Keeling MJ. Appropriate models for the management of infectious diseases. PLoS Med. 2005;2: 0621–0627. pmid:16013892
  43. 43. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2019. Available: https://www.r-project.org/
  44. 44. Atlantic The. The COVID tracking project US historical data. In: COVID Tracking Project [Internet]. 2021 [cited 15 Dec 2022]. Available: https://covidtracking.com/data/national
  45. 45. Bolker B R Development Core Team. bbmle: tools for general maximum likelihood estimation. 2022. Available: https://cran.r-project.org/package=bbmle