^{1}

^{*}

^{1}

^{2}

PR conceived the idea. HJW, PR, and MJK designed the study. HJW formulated and analyzed the models. PR, HJW, and MJK contributed to writing the paper.

The authors have declared that no competing interests exist.

Mathematical models have become invaluable management tools for epidemiologists, both shedding light on the mechanisms underlying observed dynamics as well as making quantitative predictions on the effectiveness of different control measures. Here, we explain how substantial biases are introduced by two important, yet largely ignored, assumptions at the core of the vast majority of such models.

First, we use analytical methods to show that (i) ignoring the latent period or (ii) making the common assumption of exponentially distributed latent and infectious periods (when including the latent period) always results in underestimating the basic reproductive ratio of an infection from outbreak data. We then proceed to illustrate these points by fitting epidemic models to data from an influenza outbreak. Finally, we document how such unrealistic a priori assumptions concerning model structure give rise to systematically overoptimistic predictions on the outcome of potential management options.

This work aims to highlight that, when developing models for public health use, we need to pay careful attention to the intrinsic assumptions embedded within classical frameworks.

Two current biases in infectious disease models may substantially affect their usefulness in predicting disease oubreaks.

When a new infectious disease emerges, such as SARS, it is important to try to predict how the disease will behave, e.g., how infectious it is and what its latent period is, so that the spread of the disease through the population can be estimated and appropriate public health measures such as quarantining can be decided on.

They assessed different currently used mathematical models of disease outbreaks, including models that took no account of latent periods, and another that assumed that the latent and infectious periods had a particular pattern—called exponential. They showed that both of these assumptions could potentially lead to underestimating the way the disease spreads. They then tested their predictions on a known outbreak of influenza that occurred in a school.

Public health officials may need to rethink the way that they try to predict outbreaks of infectious disease. Minimally, they need to be sure that they put into any model the most accurate predictions of the behavior of the disease.

The Health Protection Agency in the United Kingdom has a Web site that explains its work on assessing infectious disease outbreaks:

The Centers for Disease Control and Prevention in the United States is a good place to start for information on any new infectious diseases:

The past decade has seen a dramatic increase in the significance attached to infectious diseases from the public health perspective. This trend is due in part to the emergence of new and highly pathogenic infections such as Ebola [

The effective management and control of such infections is increasingly done with substantial input from mathematical models, which are used not only to provide information on the nature of the infection itself, through estimates of key parameters such as the basic reproductive ratio _{0} [

The most commonly used framework for epidemiological systems, is still the susceptible–infectious–recovered (SIR) class of models, in which the host population is categorized according to infection status as either susceptible, infectious, or recovered [

which translates mathematically into

where ^{−γt}/

(A) The change in the probability of remaining infectious as a function of time when the number of subdivisions within the infected class increases from

(B) The consequences of changes in _{0} = 5, and the same average infectious period,

The dynamical consequences of these differences in the distribution of infectious and latent periods have received some attention over the past two decades. It has been shown, for example, that the precise distribution of the infectious period has no qualitative effects on the asymptotic values or properties of the system [

Whether these marked differences between alternate model formulations may translate into potentially important public health concerns is a key question, which we address in two ways. First, we document systematic differences in the model parameters estimated from an epidemic using the exponential and gamma-distributed models. Second, we demonstrate that the use of exponential models produces overoptimistic predictions about the low levels of control required to subdue an epidemic.

During the early phase of an epidemic, the observed exponential growth rate, _{0}, of the infection. Mathematically,

for the SEIR model with gamma-distributed latent and infectious periods (further details are given in _{0} in terms of _{0}.

To study the effects of contact tracing and isolation, we modify the assumptions of the SEIR epidemic model, while still incorporating gamma-distributed latent and infectious periods. In the new model, isolation of newly infectious cases occurs at a daily rate of _{I}_{D}_{A})_{S})_{Q},_{Q}

In a typical management setting, such as the SARS outbreak of 2003, public health professionals are confronted with a novel (or perhaps a highly virulent variant of an existing) pathogen that is spreading rapidly through a predominantly susceptible population. One of the important tasks of any modeling exercise is to provide insights into some of the epidemiological characteristics of the invading infection, such as its transmissibility, virulence, and persistence dynamics. Of great interest is the estimation of the basic reproductive ratio of the infection (_{0}), which measures the transmission potential of the infection, and determines the degree of control required.

Some of these aspects can be explored by studying the range of model parameters that give (initial) outbreak dynamics consistent with the (short-term) epidemic data thus far gathered. One approach is to fit model parameters to data by “trajectory matching,” where one looks for the combination of parameters that, in a statistically rigorous sense, give rise to dynamics most consistent with observed patterns [_{0}. First, we use this approach to examine, in general, how the distribution of the latent and infectious periods may affect the estimation of _{0} from initial epidemic data. To illustrate that our results are not specific to this methodology, we then take incidence data from an influenza outbreak and parameterize the epidemic models using trajectory matching.

We may obtain an estimate for _{0} by calculating the initial growth rate _{0} estimated is crucially dependent on the fundamental assumptions made concerning the distributions of latent and infectious periods. Specifically, we find that the following equation determines the relationship between _{0} and an empirically estimated epidemic growth rate,

where

The relationship between estimated _{0} and the distributions of the latent and infectious periods is demonstrated in _{0}. In general, as the infectious period becomes more tightly distributed (increasing _{0} are estimated for any given growth rate _{0} are estimated. Indeed, we may use the relationship given by _{0} and therefore will underestimate the level of global control measures (such as mass vaccination) that will be needed to control the epidemic.

(A) The effects of changing the distributions of the latent and infectious periods on the estimated value of _{0}, with _{0} when the latent and infectious periods are both exponentially distributed (lower surface) or fixed (higher surface). We note that the shape of each surface is independent of the exact value of

(B) At higher values of λ, _{0} may be substantially over- or underestimated using the classical exponentially distributed model (

While the results described in the previous section are based on the rate of epidemic take-off, we reach the same qualitative conclusions about the effects of the distributions of latent and infectious periods when estimating _{0} by other data-fitting methods. For illustration, we use data from an influenza outbreak in an English boarding school [_{0} of the best fit for a combination of _{0}, and (ii) the assumption of exponentially distributed latent and infectious periods results in consistently lower estimates of _{0} than their gamma-distributed counterparts.

(A and B) The least squares error (LSE) (A) and _{0} (B) of the best-fit model under different assumptions about the distribution of the latent and infectious periods. (The label “w/o” denotes no latent class.)

(C) We plot the incidence data along with the SEIR best fit (

(D) The best-fit estimate of _{0} changes for these two models as we increase the number of points used in the fitting procedure. When fitting the models, for each value of

Despite visually similar solutions, the SIR best-fit and SEIR best-fit models (_{0}: 3.74 for the SIR model versus 35.9 for the SEIR model, which is partly a result of the small population size. However, the best-fit estimate of _{0} obtained from the gamma-distributed SEIR model (^{−1}), and then take the final estimates of the average latent and infectious periods to compute _{0} using _{0} of 4.38 whereas for the SEIR model (_{0} of 16.9. Thus the initial rate of increase in incidence does well in estimating _{0} for the exponentially distributed SIR model but significantly less well for the gamma-distributed SEIR model. Given that we are fitting an additional parameter, it is to be expected that a limited number of data points confounds the estimation of _{0} when we include a latent period in the model assumptions. However, this also highlights that even when incorporating a latent period, estimates of _{0} based on the initial epidemic growth rate may potentially underestimate the true value of _{0}.

The results outlined above highlight the pitfalls of making a priori assumptions concerning the distributions of latent and infectious periods when estimating parameters. Depending on the precise details, inappropriate model selection may give rise to either gross over- or underestimates for the basic reproductive ratio of an infection. However, even when parameter estimates are reliable, choice of model structure can also be very important when making recommendations concerning individual-level control strategies. Historically, it has been shown that contact tracing and the effective quarantine of infected individuals and those potentially exposed is an important means of infection management [

(A) The proportion of the population contracting an introduced infection is depicted as a function of the infected isolation rate _{I})

(B) The consequences of contact tracing.

In both, the surfaces represent predictions of the SEIR model with an exponential (colored surface) or gamma (black grid surface; _{Q}_{D}

The process of isolating infected individuals results in a reduction in the mean infectious period (see

To focus on the effects of the infectious period distribution on different courses of intervention we have assumed that all those who are quarantined and exposed are detected before the end of the quarantine period and are not released back into the general population. We have also formulated a model that takes into account the distribution of the latent period during quarantine and find similar qualitative results to those shown in

The use of models in epidemiology dates back almost a century, and while traditional models have often been highly successful in explaining observed dynamics [

The large discrepancies between estimates of _{0} from the exponentially distributed and gamma-distributed fits reiterate the importance of accurately determining the precise distributions of latent and infectious periods. Although the data required for such a task are often available from post hoc analyses of epidemics they are certainly lacking for a novel emerging infection. Instead, the uncertainty surrounding assumptions about the distributions should be incorporated into quantitative predictions made from epidemiological models, especially since this may well be greater than any uncertainty that arises from noise in the data. Of course, more sophisticated fitting methods than those used in this paper exist [

The take home message from our work is that when developing models for public health use, we need to pay careful attention to the intrinsic assumptions embedded within classical frameworks. While some practitioners are already using the approach we advocate [

(7 KB TEX).

We are grateful to Andy Dobson for comments that helped to improve the manuscript. HJW and PR are supported by the National Institutes of Health, and MJK is supported by the Royal Society. PR would also like to thank the Ellison Medical Foundation for funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

_{0}and a recipe for its calculation.

susceptible–exposed–infectious–recovered

susceptible–infectious–recovered