Estimating force of infection from serologic surveys with imperfect tests

Background The force of infection, or the rate at which susceptible individuals become infected, is an important public health measure for assessing the extent of outbreaks and the impact of control programs. Methods and findings We present Bayesian methods for estimating force of infection using serological surveys of infections which produce a lasting immune response, accounting for imperfections of the test, and uncertainty in such imperfections. In this estimation, the sensitivity and specificity can either be fixed, or belief distributions of their values can be elicited to allow for uncertainty. We analyse data from two published serological studies of dengue, one in Colombo, Sri Lanka, with a single survey and one in Medellin, Colombia, with repeated surveys in the same individuals. For the Colombo study, we illustrate how the inferred force of infection increases as the sensitivity decreases, and the reverse for specificity. When 100% sensitivity and specificity are assumed, the results are very similar to those from a standard analysis with binomial regression. For the Medellin study, the elicited distribution for sensitivity had a lower mean and higher variance than the one for specificity. Consequently, taking uncertainty in sensitivity into account resulted in a wide credible interval for the force of infection. Conclusions These methods can make more realistic estimates of force of infection, and help inform the choice of serological tests for future serosurveys.


Introduction
The force of infection, or the rate at which susceptible individuals become infected, is an important public health measure of the speed and extent of an epidemic. It can be used to quantify the impact of disease control programs, and prioritize and identify geographical regions requiring further measures, such as vaccine implementation [1][2][3][4][5]. For infections inducing a lasting immune response, the force of infection is usually estimated via serological surveys ('serosurveys') of immunological status. Ideally, assays used in such surveys should be highly sensitive and specific, while also suitable for high throughput in terms of cost and personnel requirements [6][7][8][9][10][11]. In practice, however, available assays may not completely meet all these criteria, as is currently evident with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for coronavirus disease (COVID-19) [12]. The greater the imperfections in sensitivity and specificity, the less accurate will be the corresponding estimates of the force of infection, as long as such imperfections are not taken into account.
The force of infection may be estimated from single or repeated serosurveys. In the former case, the simplest analysis is to assume that the force of infection was constant over calendar time and age, and consider each person's age to be their duration of exposure [13]. More sophisticated models allow for changing force of infection over time, or over age, or even allow for the presence of maternal antibodies if infants are included [4,14]. Carrying out more than one survey in the same individuals provides more robust estimates of the force of infection during a given study period [4,7]. Using repeated surveys, rate ratios can be obtained from binomial regression with complementary log-log link and the logarithm of the time between surveys as an offset [13]. While age is used as the time at risk in the analysis of a single survey, in repeated surveys it can be considered a risk factor like any other. However, whatever the number of surveys, errors in test status are usually ignored, whether analysing one or more surveys. In particular, for repeated surveys, individuals testing positive at baseline are usually considered no longer at risk [1,4,7], ignoring the possibility that they were false positives.
The choice of assay may substantially affect the study's interpretation [15]. Various methods have taken into account certain kinds of test imperfection, for either single or repeated surveys. In particular, Trotter & Gay [16] developed a compartmental model of multiple surveys, in which the force of infection and imperfect sensitivity were estimated for Neisseria meningitidis. For a single survey, Alleman et al. [17] and Hachiya et al. [18] estimated the force of infection, and simultaneously test sensitivity for rubella and measles, by assuming that imperfect sensitivity was the reason for seroprevalence not necessarily reaching 100% at the highest ages. Tan et al. [19] used a model for dengue, in which sensitivity reduces over time as antibody levels decrease, applied to two independent population serosurveys from blood donors. Olive et al. [20] estimated the force of infection for Rift Valley fever based on fixed values of sensitivity and specificity for a single survey.
Here we provide methods to estimate force of infection, from a single serosurvey or two serosurveys in the same individuals, accounting for imperfections in sensitivity and/or specificity, and for uncertainty in these parameters.

Methods
We started from methods for estimating prevalence based on an imperfect diagnostic test, as reviewed by Lewis & Torgerson [21], and use similar notation. Estimation is done using a Bayesian framework and Markov chain Monte Carlo (MCMC) [22]. We assume that the immune response being measured is long-lasting so that, for example, apparent seroreversions, i.e. changes over time from positive to negative, are due to test errors, rather than loss of immunity. We use "seroprevalence" to mean the proportion of individuals with the underlying immune response, which the diagnostic tests measure with error.

Model for single serosurvey
For a diagnostic test, the sensitivity is the proportion of true positives that are correctly identified by the test, and the specificity is the proportion of true negatives that are correctly identified by the test [23]. The probability of testing positive (T + ) is specified as a function of the unobserved true status (π), and the assumed values for sensitivity (S e ) and specificity (S p ): ProbðT þ Þ ¼ S e p þ ð1 À S p Þð1 À pÞ Then, representing a constant seroconversion rate, a binomial regression is specified with a complementary log-log link, and the logarithm of age as an offset. The only other term in the model is an intercept, which is the logarithm of the force of infection [13]. As a rate, the force of infection can take non-negative values, possibly greater than 1 [24]. A vague prior distribution-Gaussian with mean zero and standard deviation 1,000-is specified for the logarithm of the force of infection. Example data from a single serosurvey of dengue are from Colombo, Sri Lanka, which used a capture enzyme-linked immunosorbent assay (ELISA) to detect immunoglobulin G (IgG) [14,25]. Here we omit individuals aged less than six months in order to limit the influence of maternal antibodies.

Model for two consecutive serosurveys in the same individuals
For two repeated serosurveys, priors are placed on the seroprevalences, and the values of interest are related via standard identities. The baseline seroprevalence is assigned a beta distribution with both parameters equal to 1, i.e. uniform on the interval [0, 1]. The prior for the second seroprevalence is the same except that, consistent with the above assumptions, it is constrained to be at least as high as the baseline seroprevalence. For each survey, the positive and negative predictive values (PPV and NPV, respectively) are defined in terms of the seroprevalence and the assumed sensitivity and specificity [26]: The probability of each person being truly seropositive, Prob(D + ), is then PPV if the test is positive, and 1 − NPV if the test is negative. The probabilities of testing positive or negative are functions of sensitivity and specificity, in the same way as for a single survey. Finally, the numerator of the force of infection is estimated as the increase in expected number of true positives from the first to the second survey, and the denominator is estimated as the expected person-time at risk, calculated as the sum of the individual times between the surveys, weighted by each individual's probability of being seronegative at baseline. This is shown in the following equation, where the sum is over all individuals in both surveys, and the subscripts on D indicate the first or second survey: This is shown schematically, as a Directed Acyclic Graph [27], in Fig 1. Example data are from a community-based study of dengue in Medellin, Colombia, using a commercially available IgG indirect ELISA test [25] (S5 File). Residents were randomly selected, and tested in up to five surveys over time. For the current purpose, we use only the first survey, done in 2011, and the last one, done in 2014, approximately 26 months later.
In the standard binomial regression model for seroconversion across paired surveys, those individuals positive at baseline are assumed to be not at risk, i.e. there is no allowance for measurement error in the serostatus. By contrast, as well as seroconversion, the current model allows seroreversion, i.e. for individuals to change from seropositive to seronegative status.

Uncertainty in sensitivity and specificity
Fixed values for sensitivity and specificity can be used for the repeated surveys, as described above for a single one. However, there may be reasonable doubt as to the exact values of sensitivity and specificity, e.g. because of cross-reacting pathogens circulating to an unknown extent. This uncertainty may have been quantified by systematic reviews, although their generalizability to a given setting may be doubtful. Another way to quantify uncertainty is in terms of expert opinion, e.g. via the Delphi technique [28]. Here we follow the elicitation method of Johnson et al. [29]. For each parameter, each expert is presented with a range of values. For the current purpose, the parameters are sensitivity and specificity, each with a range of 0 to 100%, in intervals ("bins") of 5%. Each expert is invited to i) make a point or "average" estimate of the parameter in question, then ii) indicate the upper and lower limit of their estimate, then iii) indicate their weight of belief by allocating a total of 100% over the bins, between the upper and lower limits, in units of 5%. So we have a total of six questions: three each for sensitivity and specificity. Johnson et al., used paper questionnaires and stickers for the units of 5% weight of belief. We adapted this to a spreadsheet in Microsoft Excel (S1 File). This approach could also be applied to the analysis of a single survey.
For the current study, beliefs were elicited from three dengue researchers whose published work includes results of diagnostic tests. Two of these (JKL & MC) were also investigators of the serological study in Medellin [25]. In the case of dengue, one important consideration is whether the test in question may cross-react with other flaviviruses [30], or have lower specificity in those who have been vaccinated against them [31]. The elicited distributions for sensitivity and specificity are used here to illustrate the current method and are not conclusive in terms of the performance of the test in question. Also, the belief distributions for other diagnostic tests and other settings will vary.
The beliefs of the three experts were summarized as a single distribution using linear pooling [32], and a smooth distribution between 0 and 1 was fitted to the result. Both beta and logistic-normal distribution families were used: each has two parameters, which were fitted by the method of moments [33]. The beta distribution was used for the estimation of the force of infection.
More broadly, some models for sensitivity and specificity are unidentifiable [34,35], i.e. not all the parameters can be estimated independently. For the current purpose, the estimated force of infection is evidently strongly associated with the sensitivity and specificity. Although posterior likelihoods of sensitivity, specificity and force of infection could be obtained from a formally consistent Bayesian model, the identifiability of such a model would need to be demonstrated. Our interest here is in information on sensitivity and specificity as inputs, not outputs. Hence, we have not referred to the elicited beliefs for sensitivity and specificity as "priors". Although these beliefs are used in Monte Carlo simulation, posterior likelihoods are not obtained for them. Rather, values are repeatedly sampled from the fitted beta distributions of sensitivity and specificity, then MCMC estimation is done based on those values.
Credible intervals for a parameter are estimated as quantiles of samples drawn, via MCMC, from its Bayesian posterior distribution. Confidence intervals (as opposed to credible intervals) are quoted from frequentist analyses which were carried out for comparison.

Software
We use the "rjags" package in R (version 3.6.3; The R Foundation for Statistical Computing). This package requires a separate installation of the JAGS package [36]. R code is provided in S2-S4 Files. For the analysis of the Colombo survey, for each value of sensitivity and specificity used, a burn-in of 1,000 iterations was used, with estimates of the force of infection based on 50,000 iterations thinned by 10 (i.e. keeping every 10th result). For the analysis of the repeated surveys in Medellin, the following was done separately for sensitivity and specificity: 2,000 draws were made from the fitted beta distribution then, for each draw, there was a burn-in of 2,000 and the force of infection was estimated from 5,000 iterations thinned by 50. This was done separately for three scenarios): i) with sensitivity varying while holding specificity at 100%, ii) the reverse, and iii) with both parameters varying. Hence the distribution of the force of infection was estimated from 20,000 values. Assessment of convergence was done visually. For all MCMC models, the point estimate is taken to be the median of the iterative values and the 95% credible interval is from the 2.5th to 97.5th percentiles.

Ethical considerations
This Medellin study obtained ethical approval from the Ethics Committee of the University of Antioquia (reference 11-5-362) and the International Vaccine Institute (IVI, reference 2011-011). Data were made available without identifying information. Data from the Colombo study are included from a table in the previously published report [14].

Single serosurvey
For the dengue study in Colombo [14], Fig 2 shows the fitted proportions of seropositive by age. Values of 85% sensitivity or specificity have been chosen to illustrate the method rather than on the basis of expert opinion or of comparison against a gold standard. However, they are in the range found for other dengue IgG ELISAs [37,38]. As expected, imperfect sensitivity implies higher seroprevalence, and imperfect specificity the reverse. The consequences of other values of sensitivity and specificity are shown in Fig 3. The confidence bands reflect  [14]. The data point from the six-month old children in the published table are included, but not those aged less than six months, due to maternal antibodies in the younger group. The solid line is the fit from a standard analysis assuming a perfectly sensitive and specific test. The upper dashed line is from an analysis assuming 85% sensitivity and 100% specificity, and the lower dashed line with these values exchanged.
https://doi.org/10.1371/journal.pone.0247255.g002 sampling variability in the data rather than uncertainty in the values of sensitivity and specificity. From a standard frequentist analysis with binomial regression, the estimated force of infection is 13.7% per year (95% confidence interval 12.4-15.2%) . Fig 3 shows that, as expected, the results from the current model approach those from the standard analysis as sensitivity and specificity tend to 100%. For 100% sensitivity, the results from the current model are the same, and for 100% specificity, the point estimate is the same and the credible interval is 0.1% lower (12.3-15.1%).

Two consecutive serosurveys
In Medellin, 705 people had test results available for both surveys [25]. Of these, 260 originally tested negative, of whom 31 (11.9%) were positive on the second survey, approximately 26 The force of infection is estimated for each value of sensitivity or specificity, considered fixed. In this figure, when sensitivity is less than 100% then specificity is assumed to be 100%, and conversely. The grey zones are the 95% credible intervals. As sensitivity and specificity approach 100%, to the right side of the plot, the credible intervals approach the 95% confidence interval from standard binomial regression (vertical dashed line).
https://doi.org/10.1371/journal.pone.0247255.g003 months later. The remaining 445 originally tested positive, and all but four of these tested positive on the second survey. A standard frequentist binomial regression analysis, which was necessarily restricted to the 260 presumed at risk, estimates the force of infection as 5.9% per year, with a 95% confidence interval from 4.0 to 8.2%. Fig 4 shows the elicited distributions for the sensitivity and specificity of the dengue IgG ELISA used in the Medellin study, summarizing the beliefs of the three experts by linear pooling. The distribution for specificity is closer to 100% and with lower variance than that for sensitivity. The figure also shows the fitted beta and logistic-normal distributions, the former of which was used to generate the force of infection results in

PLOS ONE
Estimating force of infection from serologic surveys with imperfect tests Results from the force of infection model are shown in Fig 5. As expected from Fig 4, the distribution of the force of infection taking into account uncertainty in specificity (Fig 5a) has smaller variation than that for sensitivity (Fig 5b). For the former, the point estimate of the force of infection is 4.8% per year, with a 95% credible interval from 3.7 to 6.2%. For varying sensitivity, the point estimate is much higher, at 13.4% per year, and the credible interval much wider: 5.5% to 43.1%. With sensitivity and specificity both varying, the results are qualitatively similar to Fig 5b (S1 Fig), the point estimate is 13.3% per year, and the credible interval is from 4.6% to 43.0%.

Discussion
The current method is applicable to infections which induce a lasting immune response, which includes many viral pathogens, such as rubella and measles [18], hepatitis B and C, HIV [39], as well as dengue. Not all assays are suitable for serological surveys. For example, the World Health Organization discourages the use of rapid tests in such studies of dengue [5], and the utility of serological assays for SARS-CoV-2 is currently being debated [12,40]. Statistical methods can help quantify the degree of uncertainty that would arise from the use of any given test. Previous studies have simultaneously estimated test sensitivity and force of infection for single or repeated surveys [16][17][18], and estimated the force of infection subject to fixed values for sensitivity and specificity in a single survey [20]. Here we present more general methods for estimating the force of infection taking into account imperfect sensitivity and/or specificity, and uncertainty in these parameters, for either single or repeated surveys. Should well-established and generalizable values of sensitivity and specificity be available, they can be used in the methods described here. However, this is not always the case. For example, for dengue, there may be cross-reaction with other flaviviruses [30], whose occurrence varies geographically.
The single serosurvey model, applied to the dengue study in Colombo [14], showed how the estimated force of infection depends on the assumed sensitivity and specificity. When perfect sensitivity and specificity are assumed, the results are effectively identical to those from the standard binomial regression. For the example of repeated serosurveys in Medellin [25], the elicited expert belief for the specificity was relatively precise, resulting in a fairly precise estimate of the force of infection (95% credible interval 3.7 to 6.2% per year). The belief for sensitivity was less precise and resulted in an interval estimate that was so wide (5.5 to 43.1%) as to potentially lack utility. The results from these two studies illustrate the method, but the force of infection values should not be taken as authoritative for the study settings.
We have opted for estimation in a Bayesian framework by MCMC [22]. The model for a single serosurvey is similar to that of Lewis et al. for prevalence [21], and may be soluble by direct application of maximum likelihood, hence avoiding the need for iterative sampling. The identifiability of some Bayesian models for the estimation of prevalence is affected by the choice of priors for sensitivity, specificity and other parameters: unsuitable priors can then give rise to erroneous conclusions [34]. Although it may be possible to 'learn' about both a) sensitivity and specificity and b) the force of infection, here we have avoided identifiability concerns via Monte Carlo simulation of uncertainty in sensitivity and specificity. In effect, the elicited distribution is both the prior and posterior distribution. This approach was shown for the model for repeated surveys but could equally be applied to the one for a single survey. It was illustrated by eliciting beliefs about sensitivity and specificity from three experts: to reach substantive conclusions on dengue, a wider and more systematic exercise would be required [29]. Estimates from systematic reviews could be used instead of expert opinion if they were generalizable to a given study area.
Future work could seek models with Bayesian priors for sensitivity and specificity, while still correctly estimating the force of infection. In the meantime, the use of Monte Carlo in an outer loop, with MCMC estimation each time, makes the analysis relatively time-consuming. Also, a reformulation would be required to allow the inclusion of covariates. The method is shown for two surveys, studies with more than two could be included, with each being constrained to have a seroprevalence no lower than the previous. Another limitation is the assumption that each individual has long-lasting immunity, so that apparent seroreversions are due to test errors rather than waning immunity. The validity of this assumption will depend on the infection in question, and possibly factors such as the time between surveys, and the age and immunocompetence of the participants.
In conclusion, the methods presented here can make more realistic estimates of force of infection, and can help inform the choice of serological tests for future serosurveys.