Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Operational analysis for COVID-19 testing: Determining the risk from asymptomatic infections

  • Marc Mangel

    Roles Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    msmangel@ucsc.edu

    Affiliations Department of Biology, University of Bergen, Bergen, Norway, Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, CA, United States of America, Puget Sound Institute, University of Washington Tacoma, Tacoma, WA, United States of America

Abstract

Testing remains a key tool for managing health care and making health policy during the coronavirus pandemic, and it will probably be important in future pandemics. Because of false negative and false positive tests, the observed fraction of positive tests—the surface positivity—is generally different from the fraction of infected individuals (the incidence rate of the disease). In this paper a previous method for translating surface positivity to a point estimate for incidence rate, then to an appropriate range of values for the incidence rate consistent with the model and data (the test range), and finally to the risk (the probability of including one infected individual) associated with groups of different sizes is illustrated. The method is then extended to include asymptomatic infections. To do so, the process of testing is modeled using both analysis and Monte Carlo simulation. Doing so shows that it is possible to determine point estimates for the fraction of infected and symptomatic individuals, the fraction of uninfected and symptomatic individuals, and the ratio of infected asymptomatic individuals to infected symptomatic individuals. Inclusion of symptom status generalizes the test range from an interval to a region in the plane determined by the incidence rate and the ratio of asymptomatic to symptomatic infections; likelihood methods can be used to determine the contour of the rest region. Points on this contour can be used to compute the risk (defined as the probability of including one asymptomatic infected individual) in groups of different sizes. These results have operational implications that include: positivity rate is not incidence rate; symptom status at testing can provide valuable information about asymptomatic infections; collecting information on time since putative virus exposure at testing is valuable for determining point estimates and test ranges; risk is a graded (rather than binary) function of group size; and because the information provided by testing becomes more accurate with more tests but at a decreasing rate, it is possible to over-test fixed spatial regions. The paper concludes with limitations of the method and directions for future work.

Introduction

Entering the third year of the 2019 coronavirus disease (henceforth COVID-19) pandemic, it is clear that the world was woefully underprepared, in many different ways, for dealing with it. ItThe current pandemic is an illustration of the natural evolutionary play [1] so that one should expect future pandemics and lose no time in preparing for them while dealing with the present one.

Some authors have suggested that it is appropriate to prepare for the next pandemic as one prepares for war [25]. Operational analysis grew out of the scientific approach to operational questions in World War II [69]. One of the key tenets of operational analysis is to model the process as well as the data [7, 10]. Among the advantages, when the process is modeled, one knows the true state of world, which allows assessment of the quality of the analyses by comparison of analytical outcomes with a known situation. This gives confidence that the methods will work when the true state of the world is unknown.

Models for dynamics and control of the disease, prioritizing hospital care, and setting policy [1118] require information about the health status of the population. This is determined by testing for infection, which thus emerged as a crucial component of managing health policy during the current pandemic and will probably be key in future pandemics [19]. For example, the Global Influenza Surveillance and Response Network (the “flu network” [20, 21]) established in 1952 played a key role in the early responses to the COVID-19 pandemic. The time is now to prepare for future testing.

Testing is complicated an individual is in an early stage of the infection may give a false negative test, infected individuals may be asymptomatic and thus not tested, and symptomatic but uninfected individuals may give false positive test results. Thus, test errors involve both false negative tests, and false positive tests, in which an uninfected individual tests positive [2227]. These are called the false negative probability [28], denoted here by pFN, and false positive probability, denoted here pFP. It may one day be possible to drive the false positive probability to zero with improved specificity of tests, but the ontogeny of the disease within an individual means that there will always be false negative tests [25, 26].

A starting point for the interpretation of testing results is to envision that a population is divided into infected (antigen positive) and uninfected (antigen negative) individuals, with the goal of estimating the fraction of infected individuals (the incidence rate) from the number of positive results P when T tests are given. Because of both kinds of test errors, the surface positivity rate P/T (which is observed; henceforth simply called positivity rate) will generally differ from the incidence rate (which is not observed). It is natural and intuitive to ask for the unobserved incidence rate that is most likely given the test results; this is called the Maximum Likelihood Estimate (MLE) of the incidence rate.

Brown and Mangel [29] and Mangel and Brown [30] (also see [31, 32] where similar methods are used) show that the Maximum Likelihood Estimtate for the incidence rate, denoted by is (1) which is to be interpreted as if the right side of Eq 1 is negative. As will be explained below application of Bayesian methods allows determination of a probability distribution for the incidence rate when the right side of Eq 1 is negative.

In addition to a point estimate for the incidence rate, it is valuable to have a range of incidence rates that are consistent with the model and the data since then one can bound the incidence rate and its associated risk of further infection (Eqs 3 and 4 below). That is, forecasting for a pandemic can be improved by using a predictive distribution, rather than the point estimates in Eq 1, (cf. [33]).

In [29, 30], we show that an appropriate test range, denoted by is (2) where .

McElreath [34, p. 54] describes and equation such as Eq 2 as the 95% compatibility interval, avoiding the undesired implications of words such as “confidence” or “credible” [35]. is symmetrically distributed around the true range with very small mean error between the two [30], so that lower and upper limits for the estimated incidence rate are and .

Mangel and Brown [30] also show how likelihood methods can be used to obtain a test range when positivity is 0 (so that ). In this paper, we will show how to determine the test range when 0 < P/T < pFP However, for the remainder of this section, we assume that P/T > pFP so that the estimate of incidence rate is strictly positive.

Eqs 1 and 2 lead to the operational recommendation that one should stratify testing data according to test errors. When this is not possible, one should stratify tests according to the estimated time since exposure, assign best estimates to the test errors, and conduct sensitivity analyses of the results.

Test results can play an important role in policymaking because they can be used to determine the risk of spreading infection associated with groups of different sizes. Doing so requires a definition of risk. We define the risk to be including at least one infected individual in a group of specified size. The risk associated with a group of size h when the estimate for incidence rate is is [29, 30] (3)

Eq 3 allows us to explore the risk ramifications of groups of different sizes. Were the true incidence rate known, we replace by ft (Fig 1). Fig 1 can be used to determine the risk associated with groups of different sizes by choosing a group size on the x-axis, drawing a vertical line to intersect the curve, drawing a horizontal line that intersects the y-axis, and reading off the level of risk. When the true value of the incidence rate is unknown, we create upper and lower bounds for risk by generating curves similar to Fig 1 using the lower and upper bounds and , as in Brown and Mangel [[29, Fig 2].

thumbnail
Fig 1. The risk of groups of different sizes (Eq 3) when the true fraction of infected individuals is ft = 0.05 (i.e., we set in Eq 3).

This figure can be used to determine the risk associated with groups of different sizes (ranging from 2 to 100) by choosing a group size on the x-axis, drawing a vertical line to intersect the curve, drawing a horizontal line that intersects the y-axis, and reading off the level of risk.

https://doi.org/10.1371/journal.pone.0281710.g001

We can invert Eq 3 by specifying a level of acceptable risk and then solve fof the group size consistent with the specified acceptable risk and the estimate of incidence rate is : (4)

Replacing the estimate of incidence rate by its maximum and minimum values ) and in Eq 4 allows us to bound the acceptable group size consistent with the level of acceptable risk.

In Fig 2, I show 16 realizations of acceptable group size using the simulation methods described in [30]. The dotted line shows the group size consistent with the level of acceptable risk when the incidence rate is ft, the solid black line is the group size using the estimate in Eq 4, and the red and blue lines are the group sizes using the maximum and minimum estimates for incidence rate, ) and , respectively. One key observation is that the group size determined if the incidence rate were known (dotted line) falls between those determined from the upper and lower limits of incidence rate determined by the test range.

thumbnail
Fig 2. Sixteen realizations of the group size as a function of acceptable risk using the simulation methods described in [30].

In all panels, the number of tests is T = 2500, the true incidence rate is ft = 0.05, and the probabilities of false negative and false positives tests are 0.25 and 0.05. The dotted line shows the group size consistent with the level of acceptable risk when the incidence rate is ft, the solid black line is the group size using the estimate in Eq 1 determined using the positivity rate from the individual realization of the simulation, and the red and blue lines are the group sizes using the maximum and minimum estimates for incidence rate, ) and , respectively. One key observation is that the group size determined if the incidence rate were known (dotted line) falls between those determined from the upper and lower limits of incidence rate determined by the test range.

https://doi.org/10.1371/journal.pone.0281710.g002

It is also now well established that asymptomatic infected individuals can readily transmit infection [3647]. Birx [2] emphasizes the role of untested asymptomatic individuals in the spread of the disease. Because of asymptomatic cases, policies that exclude symptomatic individuals from groups may still have considerable risk of including infected individuals who can transmit the disease.

The first purpose of this paper is to show how to obtain the test range when there is no information on symptoms and positivity is less than the probability of a false positive test. The second purpose of this paper is to generalize Eqs 14 and develop the analogue of Fig 1 when asymptomatic and symptomatic individuals are identified at the time of testing.

When there is information on symptoms (Fig 3), a fraction ft of the population is infected and symptomatic; a fraction ρtft is infected and asymptomatic; a fraction gt is uninfected but symptomatic; and the remaining fraction, 1 − ft(1 + ρt) − gt, is neither infected nor symptomatic. Infected individuals have probabilities of a false negative test, denoted by pSFN and pAFN where the subscript S and A correspond to symptomatic and asymptomatic individuals, respectively. Uninfected individuals have probabilities of a false positive test denoted by pSFP and pAFP, respectively. Using testing information, we seek point estimates and the analogue of test ranges for the unknown incidence rates and ratio of asymptomatic to symptomatic cases.

thumbnail
Fig 3. The population divided into four classes according to infection and symptom status.

A fraction ft of the population is symptomatic and infected (antigen positive); such individuals have a probability of a false negative test pSFN. A fraction gt of the population is symptomatic but not infected; such individuals have a probability of a false positive test pSFP. A fraction ρtft of the population is infected but not symptomatic; such individuals have a probability of a false negative test pAFN. Finally, fraction 1 − ftgtρtft = 1 − ft(1 + ρt) − gt of the population is neither infected nor symptomatic; such individuals have a probability of a false positive test pAFP. The subscript t indicates that these three parameters characterize the true state of the world, however none of them are observable.

https://doi.org/10.1371/journal.pone.0281710.g003

Materials and methods

Determining test range with no information on symptoms and positivity less than the probability of a false positive test

In this case, as with Eqs 14, there is a single unknown incidence rate, which we continue to denote by ft. The methods used are generalized when there is information on symptoms, so this section is a warm-up to the harder problem.

The probability of obtaining a positive test when the incidence rate is f is (5)

The first term on the right hand side of Eq 5 corresponds to individuals who are infected and have a true positive test; the second term corresponds to individuals who are not infected and have a false positive test.

When T tests are given, the number of positive tests P is binomially distributed with parameters T and p+(f) [3032], which we write as , where . The likelihood of an incidence rate f given the test data P and T has the same form [3032], but is a function of the incidence rate f conditioned on the values of the test data (6)

In S1 Section in S1 File, we show that the maximum likelihood estimate for the fraction of the population infected satisfies (7) where .

Since , we conclude that if there is an internal maximum of the likelihood (i.e. ), it must occur when . Solving this equation for gives Eq 1. When PTp+(f), we set = 0 and arrive at the nettlesome case of this subsection.

In Fig 4, I show the logarithm of the likelihood (the log-likelihood function) as a function of incidence rate f for four values of positivity. In Fig 4(A), P/T = 0.075 and the peak of the likelihood is clearly away from the boundary f = 0. As positivity declines but stays larger than pFP, as in Fig 4(B) and Fig 4(C), there is still an internal peak of the likelihood function. However, when positivity falls below pFP, as in Fig 4(D), the maximum of the log-likelihood function occurs on the boundary.

thumbnail
Fig 4. Behavior of the log-likelihood function.

Shown is the log-likelihood function (the logarithm of the right side of Eq 6) as the positivity rate declines when pFN = 0.25, pFP = 0.05 and T = 100 tests are administered for positivity (A) 0.075, (B) 0.06, (C) 0.0525, and (D) 0.04.

https://doi.org/10.1371/journal.pone.0281710.g004

We convert from a likelihood to a probability distribution by assuming a uniform prior on f and use Bayes’s theorem to write the probability density for f given the test data (also see S2 Section in S1 File): (8)

Although the denominator in Eq 8 can be written in terms of the classical beta function [48], it is most simply viewed as a constant obtained by using a very fine discretization of the interval [0, 1].

When the maximum of the likelihood occurs at the boundary f = 0, the probability φ(f) will also have its maximum at the boundary. In this case, the test range is no longer symmetrical but is an interval [0, f0.95], where f0.95 is the value of incidence rate such that (or the equivalent when a summation instead of an integral is used in Eq 8).

Analysis when there is information on symptoms

The operational situation with information on symptoms.

We assume that T tests are administered to a population in which some individuals are symptomatic and others are not (recorded at the time of testing) and each individual tested has either a positive or negative test for coronavirus. As described in Fig 3, there are now four classes of individuals:

  • A fraction ft of the population is symptomatic and infected (antigen positive); these individuals have a probability of a false negative test pSFN.
  • A fraction gt of the population is symptomatic but not infected; these individuals have a probability of a false positive test pSFP.
  • A fraction ρtft of the population is infected but not symptomatic; these individuals have a probability of a false negative test pAFN.
  • The remaining fraction of the population, 1 − ftgtρtft = 1 − ft(1 + ρt) − gt, is neither infected nor symptomatic; these individuals have a probability of a false positive test pAFP.

When this situation holds, three kinds of test data are generated:

  • The number P of positive tests.
  • The number TS of symptomatic individuals.
  • The number PS of symptomatic individuals who tested positive.

Point estimates for the fractions of infected and uninfected symptomatic individuals and the ratio of asymptomatic to symptomatic infected individuals.

The following causal chain characterizes the operation of testing:

  • The total number of tests, T, leads to number of symptomatic individuals in the sample, TS.
  • TS leads to the number of positive tests of symptomatic individuals, PS.
  • T, TS, and PS combined lead to the remaining positive results, PPS of TTS tests from asymptomatic individuals.

As above, we let denote a binomial distribution with number of samples N and probability of a positive event p, where the dot can run from 0 (no positive event) to N (only positive events). If pS denotes the probability of sampling a symptomatic individual, pS+ the probability of obtaining a positive test from a symptomatic individual, and pA+ the probability of obtaining a positive test from an asymptomatic individual, the test results have distributions (9) (10) (11)

The probabilities on the right sides of Eqs 911 are constructed from the assumptions summarized in Fig 3. The fraction of individuals who are symptomatic is ft + gt so that (12)

Since pS+ is the probability that an individual tests positive given that the individual is symptomatic, from the definition of conditional probability (13)

The probability that an individual tests positive given that the individual is asymptomatic is computed similarly: (14)

We let and denote the maximum likelihood estimates (MLEs) for the fraction of the population that is infected and symptomatic or not infected and symptomatic respectively, and for the ratio of the fraction that is infected and asymptomatic to that which is infected and symptomatic.

When a random variable has the binomial distribution , given K positive events, the MLE for p is (see S1 Section in S1 File) so that the MLEs for the probabilities in Eqs 911 are TS/T, PS/TS, and (PPS)/(TTS) from which we conclude (15) (16) (17)

Eqs 15 and 16 are independent of and can be rewritten as (18) which we write as , where c1(PS, TS) is the combination of terms multiplying on the right side of Eq 18 and the test errors are suppressed. With this notation, we substitute into Eq 15 and solve to obtain (19)

Thus, both and are known; they are random variables because PS and TS are random variables.

We now rewrite Eq 17 as (20) let c2 = 1 − pAFNpAFP, and solve for to obtain (21)

Since the right side of Eq 21 depends on the test data TS, PS, and P, is also a random variable.

Eqs 18, 19 and 21 generalize Eq 2 to the case in which symptomatic and asymptomatic individuals are identified at the time of testing. We have already thus generalized the method in [29, 30] to obtain point estimates of the fractions of the population of infected and symptomatic, uninfected and symptomatic, and infected and asymptomatic individuals. We next explore the properties of these point estimates and then generalize the notion of test range and compute and the risk of groups of different sizes including asymptomatic infected individuals.

The means of the estimates.

We compute the means of the estimates, continuing to use ft, gt, and ρt to denote their true values, with two goals. We explore 1) whether the estimates in Eqs 18, 19 and 21 are unbiased, in the sense that their expectations (over the stochastic sampling process) are the underlying true values generating the data, and 2) if there is a bias how to characterize it.

The mean of .

We begin by rewriting Eq 19 as (22)

In S2 Section in S1 File, I show that so that we can rewrite Eq 22 as (23)

Since the denominator in Eq 23 is a constant, the expectation of is (24)

In the S2 Section in S1 File, I show that and , from which it follows that ; the expected value of is the true value that underlies the testing process.

The mean of .

We begin by multiplying the top and bottom of the right side of Eq 18 by TS to obtain (25) and now use the version of on the far right side of Eq 23 to obtain (26)

Taking expectations on the far right side of Eq 26, we obtain (27) so that the expected value of is the true value that underlies the testing process.

The mean of .

We begin with Eq 21, rewritten as (28)

Eq 28 is a nonlinear function of and and involves the quotients of the random variables. We can approximate the expectation of using the delta-method [30, 49], which involves Taylor expansion of the right hand side of Eq 28 to second order and then taking expectations. (Details are in the S2 Section in S1 File). The result is (29) where Var(X) and Cov(X, Y) denote the variance and covariance of random variables X and Y, which arise from the second order Taylor expansion.

The right side of Eq 29 shows that the leading term in the expected value of is the true value generating the data and that this is corrected by variances and covariances that account for the nonlinearity in Eq 24.

Joint properties of and via likelihood analysis

Eq 2 for the test range can be obtained by direct manipulation of the relevant random variables [30]. When we separate symptomatic and asymptomatic infections, the compatibility interval for the incidence rate is replaced by a compatibility region (CR) for the incidence rate of symptomatic infected individuals and the ratio of asymptomatic to symptomatic individuals. Because Eqs 18 and 19 are nonlinear in the test results (which are random variables) and Eq 21 is also nonlinear in , the analytical approach used in Mangel and Brown [30] is less feasible now.

We develop the analogue of Eq 2 for test range by using likelihood analysis [5052], exploiting the general property that for a smooth and well-behaved likelihood (which those that follow are), a 95% CR can be approximated by finding the range of variables for which the log-likelihood is below the peak log-likelihood by 1.96 times the number of free parameters. This is essentially a generalization of the Gaussian approximation to the binomial distribution [53] that leads to Eq 2 [30].

We denote the test results by , and . For any values of f, g, and ρ, the rules of conditional probability imply (suppressing the dependence on T which is known) (30)

Each term on the right side of Eq 30 is a binomial distribution. In particular, for any values of f, g, and ρ, (31) (32) (33) the probabilities on the right side of Eqs 3133 are, respectively, pS, pS+, and pA+ in Eqs 1214 for any values of f, g and ρ, rather than their true but unknown values.

When data TS, PS, and P are obtained, the likelihoods, given the data, that the state of the environment is f, g, and ρ are (34) (35) (36)

The likelihood of the data {TS, PS} from symptomatic individuals only depends on the values of f and g and is (37) and the total likelihood of all the data {TS, PS, P} is (38)

The likelihoods in Eqs 37 and 38 are products of binomial distributions that are well approximated, for sufficient numbers of tests, by the appropriate Gaussian distribution [30, 53]. In the results, we will explore log-likelihoods for both the binomial distributions and their Gaussian approximations.

Simplifying the likelihoods.

Keeping our eyes on the prize of computing the risk of including infected but asymptomatic individuals in groups of different sizes, we focus on f and ρ when constructing the CR. Exploring the likelihood is more convenient if one can eliminate having to deal with g explicitly. Two methods are the profile likelihood and the marginal likelihood [49]; both reduce the number of parameters from 3 to 2.

For the profile likelihood, we replace g in Eqs 37 and 38 by the MLE , so that (39) (40)

For the marginal likelihood, we integrate Eqs 37 and 38 over g, so that (41) (42)

By numerical exploration, I found that for the operational questions modeled here, the two methods give virtually the same results for the answers. Were we interested in the tails of the likelihood, this might not be the case. Since the profile likelihood is computationally much speedier, I report results using it. The third Rscript in S4 Section in S1 File allows one to explore the differences between marginal and profile likelihoods for the symptomatic data.

The compatibility regions from the profile likelihood.

I computed the approximate 95% CR from the total profile likelihood using a generalization of the method of Hudson [51] by first finding the maximum value of the profile log-likehood and then determining the region in f, g, or f, ρ-space in which the log-likelihood was 2 ⋅ 1.96 = 3.92 below its maximum value.

I did computations using R Studio 1.0.143 with underlying R 3.6.1 GUI 1.70 El Capitan build (7684) on an iMac running Mac OS 12.1.

Results

Test range with no information on symptoms and positivity less than the probability of a false positive test

When positivity is less than the probability of false positive test, φ(f|P, T) has, similar to the likelihood, its maximum at f = 0 and monotonically declines. The peak value of φ(f|P, T) and the rate of decline depend on the positivity and the number of tests (Fig 5).

thumbnail
Fig 5. The normalized likelihoods for incidence rate.

Shown are normalized likelihoods (i.e., posteriors with a uniform prior) when pFN = 0.25, pFP = 0.05, and positivity is (A) 0.025, (B) 0.0125, or (C) 0.00625. The colored curves correspond to different numbers of tests shown in the legend inset; since positivity is specified, higher numbers of tests are associated with lower levels of positivity.

https://doi.org/10.1371/journal.pone.0281710.g005

These normalized likelihoods In Fig 5 have a test range that depends on the number of tests (Fig 6). As with the situation in which positivity exceeds the probability of a false positive test, the test range declines with test numbers but at a decreasing rate.

thumbnail
Fig 6. The test range for incidence rate.

Shown are the test ranges for posteriors with a uniform prior when pFN = 0.25, pFP = 0.05, and positivity is (A) 0.025, (B) 0.0125, or (C) 0.00625.

https://doi.org/10.1371/journal.pone.0281710.g006

Point estimates, compatibility regions, and risk when there is information on symptoms

In the base case for Monte Carlo simulations, I set N = 1000 replicates of T = 1500 tests. Since simulation and test errors scale as the reciprocal of their values, these choices have inherent errors of the order of 3%, which are sufficient to understand the qualitative patterns and most of the quantitative patterns. I chose the parameters for the true state of the world and the test errors from those reported in [2227]: ft = 0.05, gt = 0.04, and ρt = 1.5 and the test errors are pSFN = 0.25, pSFP = 0.03, pAFN = 0.5, and pAFP = 0.003. S3 Section in S1 File contains results for other choices of the true but unknown state of the world.

For the likelihood calculations and the associated risk computations, I first assume that the test results are the expected values TS, PS, and P, which is a reasonable assumption when T is large enough, after which I allow the test results to vary more widely. For the base case parameters, the mean values are , and . Since actual test data can only produce integer values, I rounded the and to 58 and 118, respectively. Doing so gives the point estimates , and (significant digits included to illustrate how little accuracy is lost by the rounding process; since the true values are ft = 0.05, gt = 0.04, and ρt = 1.5).

Illustrative simulated data.

The nth replicate of the simulation of the testing process yields estimates , and . In Fig 7, I show the first 100 values of the simulation replicates. Each circle represents the value of , or on the nth replicate of the simulation. The thick red lines represent the averages over the entire 1000 simulations. There are also black lines at the true values of the three parameters.

thumbnail
Fig 7. Results of simulating the process of testing.

Shown (for ease of presentation) are the first 100 values of the point estimates for ft (upper left panel), gt (upper right panel), and ρt (lower left panel). The lower right panel is an expanded version of the point estimates for ρt. Each circle represents the value of , or on the nth replicate of the simulation. The thick red lines represent the averages over the entire 1000 simulations. There are also black lines at the true values of the three parameters. In the lower right panel, the y-axis is expanded to show that the mean of the exceeds ρt; see the text for an explanation. The means of and essentially sit on top of the true values; again see the text for an explanation. The thin dotted lines show the means of the estimates ±1.96 times their standard deviations.

https://doi.org/10.1371/journal.pone.0281710.g007

The means of and essentially sit on top of the true values, as we would expect from the analysis in Eqs 2227 showing that and . To quantify this agreement, I computed the mean relative error (ME) for the three estimates. For example, for , it is (43)

For the simulation illustrated in Fig 7, and (i.e., both a fraction of a percent).

The lower right panel of Fig 7 has an expanded y-axis to show that the mean of the exceeds ρt. For this run of the simulation, . While this is less than 2.0%, it is almost four times larger than the mean errors of and .

In Fig 7, the thin dotted lines show the means of the estimates ±1.96 times their standard deviations. These are a naive 95% compatibility interval under a Gaussian approximation because they ignore the other two parameters. Even so, for the full set of 1000 replicates of the simulation, the fractions of points outside this naive interval are 0.045, 0.055, and 0.05, respectively, for , , and .

We conclude that the formulas for the MLEs accurately capture their true values. The specific results, of course, depend on the simulation results and the number of tests given (both addressed in the next section). For example, in a different run of 1000 simulations of the testing process, the mean relative errors were -0.0073, -0.0006, and -0.033 for , , and respectively, and the fractions of points outside of the naive 95% CR were 0.054, 0.035, and 0.045 for , , and , respectively.

Likelihood, compatibility regions, and the risk of groups of different sizes.

In order to focus on a single value of “test data” we continue using the expected values of TS, PS, and P. After exploring the situation when test data are the mean value, we will vary the test data.

The likelihood of the symptomatic data.

On the way to the goal of estimating the fraction of asymptomatic infections, it is worthwhile to briefly stop and explore the likelihood of the symptomatic data, which are independent of ρ (Eq 37). In Fig 8, I show the likelihood when the means of TS and PS are the test results. In this figure, the white dot denotes the true values of parameters and sits at the peak of the heat map.

thumbnail
Fig 8. The likelihood (Eq 37) of the fractions of individuals who are infected and symptomatic, f, and uninfected and symptomatic g, when the test data are the means of TS and PS.

The white dot denotes the true values of the parameters.

https://doi.org/10.1371/journal.pone.0281710.g008

When the incidence rate is f and the fraction of symptomatic individuals who are uninfected is g, the mean value of TS = T(f + g) so that we expect a negative correlation between values of f and g, which is evidenced in the figure by the orientation of the contours of likelihood.

For the purposes of the risk calculation, the most important role of the likelihood function of the symptomatic data is to provide the MLE value of g for construction of the profile likelihood in Eq 40, to which we now turn.

The likelihood of all the data.

In Fig 9, I show the profile likelihood (Eq 40) for f and ρ when the test data are the mean values of of TS, PS, and P. The banana shape of the contours of the 95% CR computed is a result of the nonlinearity in Eq 28. In this case, the likelihood is centered at the true values of the parameters (shown by the white dot), and when Eq 28 is converted to a function ρ(f) by using the MLE and replacing by f and by ρ, the true values of the parameters sit on the resulting curve, which runs through the middle of the 95% CR.

thumbnail
Fig 9. The profile log-likelihood (Eq 40) for the fraction of individuals who are infected and symptomatic f and the ratio of of the fraction of individuals who are infected and asymptomatic to those who are infected and symptomatic ρ when the test data are the means of TS, PS, and P.

The white dot denotes the true values of the parameters. The 95% CR contour is shown in white for the exact binomial likelihoods and in gray for the Gaussian approximation to those likelihoods. The white dotted line is obtained by replacing in Eq 28 by its MLE value, replacing by an arbitrary value f, and then viewing the right side as an equation for the ratio ρ(f) of asymptomatic to symptomatic infected individuals.

https://doi.org/10.1371/journal.pone.0281710.g009

One property of the range formula in Eq 2 is that test range declines as . That is, although the range declines as the number of tests increases, it does so at a decreasing rate [30, Figures 6-8 and p. 16ff]. This observation is more than an academic point, because it has the operational implication that it is possible to over-sample by providing too many tests in a single spatial region (also [30]).

In Fig 10, I explore the consequences of simultaneously increasing the number of tests and relaxing the assumption that the test data are the mean values of TS, PS, and P so that the true values (the white dots) no longer sit in the middle of the 95% CR or on the curve ρ(f) and the contours move in space, as determined by the test results. As with Eq 2, contours shrink as the number of tests increases, but at a decreasing rate.

thumbnail
Fig 10. The consequences of varying test numbers and letting test data vary from the mean values of TS, PS and P.

In each panel the white dot represents ft and ρt, and the dotted curve is the function ρ(f) described in the caption to the previous figure and which now depends on the test results. The upper left panel reproduces Fig 9, in which T = 1500 and the test data are the mean values of TS, PS, and P. In the other panels, the test data are a random realization of the simulation of the testing process and going clockwise from the upper left panel, T = 2000, 3000, 3500, 4000, and 4500.

https://doi.org/10.1371/journal.pone.0281710.g010

The prize: The risk of including asymptomatic infected individuals in groups of different sizes.

We are now able to compute the risk of including asymptomatic but infected individuals in groups of different sizes, to generate a curve analogous to Fig 1. For any values of f and ρ, the fraction of asymptomatic infected individuals in the population is ρf. Hence, the analogue of Eq 3 is (44)

In Fig 11, I show the risk computed using the mean test data, profile likelihood and , the minimum of ρf on the 95% CR contour, or the maximum of ρf on the 95% CR contour in Eq 44. Clearly, one can invert Eq 44 in analogy to Eq 4 and compute analogues of the results shown in Fig 2.

thumbnail
Fig 11. The risk of including asymptomatic individuals in groups of different sizes.

The solid line corresponds to using the MLEs for , and in the risk formula (Eq 44), and the two dotted lines correspond to using the minimum and maximum values of ρf on the 95% CR contour.

https://doi.org/10.1371/journal.pone.0281710.g011

Discussion

It is important to recognize that the analysis presented in this paper is a procedure that allows one to go from test information to the risk of including infected individuals in groups of various sizes when there is no information on symptoms at the time of testing or to the risk of including asymptomatic individuals in groups of different sizes when there is information on testing. Rather than being binary (risky or not), this risk is graded and the specific details of the relationship between group size and risk depends upon the operational details of testing such as test numbers and errors. Once these are specified, the procedures can be employed.

Let us now consider three limitations of the methods developed here. First note that Eq 26 has the same problem as Eq 1 when the positivity is very small. To see this, we factor out TS on the far right side of Eq 26 to obtain so that if the positivity rate among symptomatic individuals falls below the probability of a false positive test among symptomatic individuals, is less than zero. As in the situation with no information on symptoms, the operational interpretation is that we then set to 0. Alternatively, we may generalize the analysis for the simpler case by putting a prior on the parameters and determining the CR in that manner.

Second, an objection may be made that the binomial distribution underlying the analysis relies on the strong assumption that tests are independent events but that often groups of people will test together so that modeling the testing process requires an aggregated distribution. This is a fair objection, however: 1) the binomial distribution is the appropriate starting point, and if the sample is large and diverse enough (e.g., from many different testing sites), the independence assumption should be at least approximately valid; and 2) a negative binomial distribution of the form used in ecology to model aggregated counts [48, pp. 103–111] is a natural starting point for extending the work here.

Third, an objection may be made that we have assumed the values for test errors rather than estimating them. DiCiccio et al. [54] show that estimating test errors at the same time as incidence rate is a much more complex problem, and is likely one whose solution is not easily transferred to recommendations for practice. An alternative is to stratify test results by both symptomatic or not and time since putative exposure, have approximate values for the test errors for each time since exposure, and conduct sensitivity analysis by varying the values of the test errors. Furthermore, Mangel and Brown [30, pp. 25–27] show how to generalize Eq 1 for the case of a distribution of test errors using the delta-method. A similar extension of Eqs 18, 19 and 21 is a potential next step in this work.

In some locations, individuals are already asked whether they are symptomatic or not at the time of testing. For example, Nomi Health in Utah requires a self-reporting form for obtaining a coronavirus test, and the form includes yes or no questions such as: “Do you have a fever, a cough, new or increased shortness of breath, decreased smell or taste, a sore throat, muscle aches or pains, a headache, congestion or a runny nose, nauseas or vomiting, diarrhea, fatigue?”

During the 2020–2022 academic years, natural experiments in testing were occurring on college campuses [55]. The results of those tests will provide a trove of information to explore with the methods developed here.

Conclusions

In conclusion, the approach of modeling and simulating the process of testing before analyzing testing data leads to a range of insights and at least the following operational recommendations:

  • At the time of testing, collect information on whether and individual is symptomatic or not.
  • At the time of testing, collect information on putative time since exposure to infection.
  • Conduct experiments to obtain information on means and variances of test errors.

There is much to be done and no time to lose before the next pandemic.

Supporting information

S1 File. Including a brief review of the binomial distribution and likelihood, a mathematical appendix with details of calculations in the main text, sensitivity analysis when there is information on symptoms, and codes that generate the results in the main text and sensitivity analysis.

https://doi.org/10.1371/journal.pone.0281710.s001

(PDF)

Acknowledgments

I thank Alan Brown for inviting me, early in the pandemic, to think about operational analysis of testing for coronavirus with him and Matt Shaffer for his steady encouragement during the work. For comments on presentations and this manuscript, I thank three anonymous referees, Tiffany Bogich, Rebecca Borchering, Alan Brown, Emily Howerton, John Ivancovich, Elyse Johnson, Chaya Pflugeisen, Matt Shaffer, Katriona Shea, and Joseph Travis.

References

  1. 1. Lederberg J. Pandemic as a natural evolutionary phenomenon. Soc Res. 1988 Oct 1:343–59.
  2. 2. Birx D. Silent Invasion. New York: Harper Collins. 2022.
  3. 3. Doughton S. Bill Gates: We must prepare for the net pandemic like we prepare for war. Seattle Times. 27 Jan 2021 [Cited 2021 Jan 27].
  4. 4. Gates B. How To Prevent the Next Pandemic. New York: A.A. Knopf. 2022.
  5. 5. Gilbert S. Vaccine vs Virus: This race and the next one. The 44th Dimbleby Lecture. 2021. [Cited 2022 Feb 21]. Available from https://www.ox.ac.uk/news/2021-12-07-professor-dame-sarah-gilbert-delivers-44th-dimbleby-lecture
  6. 6. Budiansky S. Blackett’s War. The men who defeated the Nazi U-boats and brought science to the art of warfare. New York: A.A. Knopf. 2013.
  7. 7. Mangel M. Applied mathematicians and naval operators. SIAM Review. 1982 Jul; 24(3):289–300.
  8. 8. Morse PM. In at the beginnings: A physicist’s life. MIT Press; 1977.
  9. 9. Tidman KR. The Operations Evaluation Group: A History of Naval Operations Analysis. Annapolis, MD: Naval Institute Press; 1984.
  10. 10. Shelton AO, Mangel M. Reply to Sugihara et al: The biology of variability in fish populations. Proceedings of the National Academy of Sciences. 2011 Nov 29;108(48):E1226. Available from www.pnas.org/cgi/doi/10.1073/pnas.1115765108
  11. 11. Atkins BD, Jewell CP, Runge MC, Ferrari MJ, Shea K, Probert WJ, et al. Anticipating future learning affects current control decisions: A comparison between passive and active adaptive management in an epidemiological setting. J Theor Biol. 2020 Dec 7;506:110380. https://doi.org/10.1016/j.jtbi.2020.110380 pmid:32698028
  12. 12. Howerton E, Ferrari MJ, Bjørnstad ON, Bogich TL, Borchering RK, Jewell CP, et al. Synergistic interventions to control COVID-19: Mass testing and isolation mitigates reliance on distancing. PLoS Comput Biol. 2021 Oct 28;17(10):e1009518. pmid:34710096
  13. 13. Huang NE, Qiao F, Wang Q, Qian H, Tung KK. A model for the spread of infectious diseases compatible with case data. Proceedings of the Royal Society A. 2021 Oct 27;477(2254):20210551. https://doi.org/10.1098/rspa.2021.0551
  14. 14. Nichols JD, Bogich TL, Howerton E, Bjørnstad ON, Borchering RK, Ferrari M, et al. Strategic testing approaches for targeted disease monitoring can be used to inform pandemic decision-making. PLoS Biol. 2021 Jun 17;19(6):e3001307. pmid:34138840
  15. 15. Reimer JR, Ahmed SM, Brintz BJ, Shah RU, Keegan LT, Ferrari MJ, et al. The effects of using a clinical prediction rule to prioritize diagnostic testing on transmission and hospital burden: A modeling example of early severe acute respiratory syndrome Coronavirus 2. Clin Infect Dis. 2021 Nov 15;73(10):1822–30. pmid:33621329
  16. 16. Shea K, Runge MC, Pannell D, Probert WJ, Li SL, Tildesley M, et al. Harnessing multiple models for outbreak management. Science. 2020 May 8;368(6491):577–9. pmid:32381703
  17. 17. Struben J. The coronavirus disease (COVID-19) pandemic: simulation-based assessment of outbreak responses and postpeak strategies. Syst Dyn Rev. 2020 Jul;36(3):247–93. pmid:33041496
  18. 18. Wan R, Zhang X, Song R. Multi-Objective Model-based Reinforcement Learning for Infectious Disease Control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Singapore [Virtual Event]. 2021 Aug 14–18. 1634–1644.
  19. 19. Nuzzo J, Mullen L, Snyder M, Cicero A, Inglesby TV. Preparedness for a high-impact respiratory pathogen pandemic. Baltimore, MD: Johns Hopkins Center for Health Security. 2019. [Cited 2022 Mar 11]. Available from https://www.centerforhealthsecurity.org/our-work/publications/preparedness-for-a-high-impact-respiratory-pathogen-pandemic
  20. 20. Kapczynski A. Order without intellectual property law: Open science in influenza. Cornell L. Rev. 2016;102:1539–1615.
  21. 21. Stein JG. Take It off-site: World order and international institutions after COVID-19. In: Brands H, Gavin FJ. COVID-19 and World Order. Baltimore, MD: Johns Hopkins University Press; 2020. 259–76.
  22. 22. Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, Zambrano-Achig P, Del Campo R, Ciapponi A, et al. False-negative results of initial RT-PCR assays for COVID-19: a systematic review. PloS One. 2020 Dec 10;15(12):e0242958. https://doi.org/10.1371/journal.pone.0242958 pmid:33301459
  23. 23. Green DA, Zucker J, Westblade LF, Whittier S, Rennert H, Velu P, et al. Clinical performance of SARS-CoV-2 molecular tests. J Clin Microbiol. 2020 Jul 23;58(8):e00995–20. https://doi.org/10.1128/JCM.00995-20 pmid:32513858
  24. 24. He JL, Luo L, Luo ZD, Lyu JX, Ng MY, Shen XP, et al. Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China. Respir Med. 2020 Jul 1;168:105980. pmid:32364959
  25. 25. Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J. Variation in false-negative rate of reverse transcriptase polymerase chain reaction–based SARS-CoV-2 tests by time since exposure. Ann Intern Med. 2020 Aug 18;173(4):262–7. pmid:32422057
  26. 26. Sethuraman N, Jeremiah SS, Ryo A. Interpreting diagnostic tests for SARS-CoV-2. JAMA. 2020 Jun 9;323(22):2249–51. pmid:32374370
  27. 27. Watson J, Whiting PF, Brush JE. Interpreting a COVID-19 test result. BMJ. 2020 May 12;369. https://doi.org/10.1136/bmj.m1808 pmid:32398230
  28. 28. Waller L, Levi T. Building Intuition Regarding the Statistical Behavior of Mass Medical Testing Programs. Harvard Data Science Review. Special Issue 1—Covid-19: Unprecendented Challenges and Chances. 2021.
  29. 29. Brown A, Mangel M. Operational Analysis for Coronavirus Testing: Recommendations for Practice. Laurel, MD: Johns Hopkins University Applied Physics Laboratory. 2021. [Cited 2022 Jan 7]. NSAD-R-21-014. Available from https://www.jhuapl.edu/Content/documents/OperationalAnalysisCoronavirusTesting.pdf
  30. 30. Mangel M, Brown A. Operational Analysis for Coronavirus Testing. Laurel, MD: Johns Hopkins University Applied Physics Laboratory Report. 2021. [Cited 2021 Jan 7]. NSAD-R-21-041. Available from https://www.jhuapl.edu/Content/documents/MangelBrown.pdf
  31. 31. Böttcher L, D’Orsogna MR, Chou T. A statistical model of COVID-19 testing in populations: effects of sampling bias and testing errors. Philos Trans A Math Phys Eng Sci. 2022 Jan 10;380(2214):20210121. https://doi.org/10.1098/rsta.20210121
  32. 32. Vasiliauskaite V, Antulov-Fantulin N, Helbing D. On some fundamental challenges in monitoring epidemics. Philos Trans A Math Phys Eng Sci. 2022 Jan 10;380(2214):20210117. pmid:34802270
  33. 33. Ioannidis JP, Cripps S, Tanner MA. Forecasting for COVID-19 has failed. Int. J. Forecast. 2020 Aug 25. 38:423–38. https://doi.org/10.1016/j.ijforecast.2020.08.004 pmid:32863495
  34. 34. McElreath R. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC; 2020 Mar 13.
  35. 35. Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers EJ. The fallacy of placing confidence in confidence intervals. Psychonomic bulletin & review. 2016 Feb;23(1):103–23. pmid:26450628
  36. 36. Beale S, Hayward A, Shallcross L, Aldridge RW, Fragaszy E. A rapid review and meta-analysis of the asymptomatic proportion of PCR-confirmed SARS-CoV-2 infections in community settings. Wellcome Open Research. 2020;5:266 [Cited 2021 Jun 21].
  37. 37. Buitrago-Garcia D, Egli-Gany D, Counotte MJ, Hossmann S, Imeri H, Ipekci AM, et al. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis. PLoS Med. 2020 Sep 22;17(9):e1003346. https://doi.org/10.1371/journal.pmed.1003346 pmid:32960881
  38. 38. Day M. Covid-19: four-fifth of cases are asymptomatic, China figures indicate. BMJ 2020:369:m1375. 2020. pmid:32241884
  39. 39. Ferguson J, Dunn S, Best A, Mirza J, Percival B, Mayhew M, et al. Validation testing to determine the sensitivity of lateral flow testing for asymptomatic SARS-CoV-2 detection in low prevalence settings: Testing frequency and public health messaging is key. PLoS Biol 2021 Apr 29;19(4):e3001216. https://doi.org/10.1371/journal.pbio.3001216 pmid:33914730
  40. 40. He X, Lau EH, Wu P, Deng X, Wang J, Hao X, et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med. 2020 May;26(5):672–5. pmid:32296168
  41. 41. He J, Guo Y, Mao R, Zhang J. Proportion of asymptomatic coronavirus disease 2019: A systematic review and meta-analysis. J Med Virol. 2021 Feb;93(2):820–30. pmid:32691881
  42. 42. Kalish H, Klumpp-Thomas C, Hunsberger S, Baus HA, Fay MP, Siripong N, et al. Undiagnosed SARS-CoV-2 seropositivity during the first 6 months of the COVID-19 pandemic in the United States. Sci Transl Med. 2021 Jul 7;13(601):eabh3826. pmid:34158410
  43. 43. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science. 2020 May 1;368(6490):489–93. pmid:32179701
  44. 44. Long QX, Tang XJ, Shi QL, Li Q, Deng HJ, Yuan J, et al. Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections. Nat Med. 2020 Aug;26(8):1200–4. pmid:32555424
  45. 45. Oran DP, Topol EJ. Prevalence of asymptomatic SARS-CoV-2 infection: A narrative review. Ann Intern Med. 2020 Sep 1;173(5):362–7. pmid:32491919
  46. 46. Rasmussen AL, Popescu SV. SARS-CoV-2 transmission without symptoms. Science. 2021 Mar 19;371(6535):1206–7. pmid:33737476
  47. 47. Shental N, Levy S, Wuvshet V, Skorniakov S, Shalem B, Ottolenghi A, et al. Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers. Sci Adv. 2020 Sep 11;6(37):eabc5961. pmid:32917716
  48. 48. Mangel M. The theoretical biologist’s toolbox: quantitative methods for ecology and evolutionary biology. Cambridge University Press; 2006 Jul 27.
  49. 49. Hilborn R, Mangel M. The Ecological Detective. Confronting Models with Data. Princeton University Press. 1997.
  50. 50. Edwards AWF. Likelihood. Expanded Edition, Baltimore and London: The Johns Hopkins University Press. 1992.
  51. 51. Hudson DJ. Interval estimation from the likelihood function. Philos Trans R Soc Lond B Biol Sci. 1971 Jul;33(2):256–62.
  52. 52. Severini TA. Likelihood methods in statistics. Oxford University Press; 2000.
  53. 53. Feller W. An Introduction to Probability Theory and Its Applications, Volume 1. New York: John Wiley & Sons. 1968.
  54. 54. DiCiccio TJ, Ritzwoller DM, Romano JP, Shaikh AM. Confidence intervals for seroprevalance. arXiv:2103.15018v2. 2021 Aug 9. Statistical Science in press
  55. 55. Booeshaghi AS, Tan F, Renton B, Berger Z, Pachter L. Markedly heterogeneous COVID-19 testing plans among US colleges and universities. MedRxiv. Preprint. [Cited 2020 Oct 5]. https://doi.org/10.1101/2020.08.09.20171223