Estimating the case-fatality risk (CFR)—the probability that a person dies from an infection given that they are a case—is a high priority in epidemiologic investigation of newly emerging infectious diseases and sometimes in new outbreaks of known infectious diseases. The data available to estimate the overall CFR are often gathered for other purposes (e.g., surveillance) in challenging circumstances. We describe two forms of bias that may affect the estimation of the overall CFR—preferential ascertainment of severe cases and bias from reporting delays—and review solutions that have been proposed and implemented in past epidemics. Also of interest is the estimation of the causal impact of specific interventions (e.g., hospitalization, or hospitalization at a particular hospital) on survival, which can be estimated as a relative CFR for two or more groups. When observational data are used for this purpose, three more sources of bias may arise: confounding, survivorship bias, and selection due to preferential inclusion in surveillance datasets of those who are hospitalized and/or die. We illustrate these biases and caution against causal interpretation of differential CFR among those receiving different interventions in observational datasets. Again, we discuss ways to reduce these biases, particularly by estimating outcomes in smaller but more systematically defined cohorts ascertained before the onset of symptoms, such as those identified by forward contact tracing. Finally, we discuss the circumstances in which these biases may affect non-causal interpretation of risk factors for death among cases.
Citation: Lipsitch M, Donnelly CA, Fraser C, Blake IM, Cori A, Dorigatti I, et al. (2015) Potential Biases in Estimating Absolute and Relative Case-Fatality Risks during Outbreaks. PLoS Negl Trop Dis 9(7): e0003846. https://doi.org/10.1371/journal.pntd.0003846
Editor: Alison P. Galvani, Yale University, UNITED STATES
Published: July 16, 2015
Copyright: © 2015 Lipsitch et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Funding: Research reported in this publication was supported by the National Institute Of General Medical Sciences of the US National Institutes of Health under Award Number U54GM088558. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. It was also supported by the Bill and Melinda Gates Foundation, the European Union Seventh Framework Programme [FP7/2007–2013] under Grant Agreement no278433-PREDEMICS and by Centre funding from the UK Medical Research Council. No funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The case-fatality risk (CFR) is a key quantity in characterizing new infectious agents and new outbreaks of known agents. The CFR can be defined as the probability that a case dies from the infection. Several variations of the definition of “case” are used for different infections, as discussed in Box 1. Under all these definitions, the CFR characterizes the severity of an infection and is useful for planning and determining the intensity of a response to an outbreak [1,2]. Moreover, the CFR may be compared between cases who do and do not receive particular treatments as a way of trying to estimate the causal impact of these treatments on survival. Such causal inference might ideally be done in a randomized trial in which individuals are randomly assigned to treatments, but this is often not possible during an outbreak for logistical, ethical, and other reasons . Therefore, observational estimates of CFR under different treatment conditions may be the only available means to assess the impact of various treatments.
Box 1. Definition of the CFR.
The CFR itself is an ambiguous term, as its definition and value depend on what qualifies an individual to be a “case.” Several different precise definitions of CFR have been used in practice, as have several imprecise ones. The infection-fatality risk (sometimes written IFR) defines a case as a person who has shown evidence of infection, either by clinical detection of the pathogen or by seroconversion or other immune response. Such individuals may or may not be symptomatic, though asymptomatic ones may go undetected. The symptomatic case-fatality risk (sCFR) defines a case as someone who is infected and shows certain symptoms. Infection in many outbreaks is given several gradations, including confirmed (definitive laboratory confirmation), probable (high degree of suspicion, by various clinical and epidemiologic criteria, without laboratory confirmation), and possible or suspected (lower degree of suspicion). This paper describes issues in estimating any of these risks or comparing them across groups, but does not go into the details of each possible definition.
Furthermore, unlike risks commonly used in epidemiologic research (e.g., the 5-year mortality risk), the length of the period during which deaths are counted for the CFR is rarely explicit, probably because it is considered to be short enough to avoid ambiguity in the definition of CFR. However, a precise definition of the CFR would need to include the risk period, e.g., the 1-month CFR of Ebola. Clearly, the definition of CFR for a particular investigation should be specified as precisely as possible.
However, observational studies conducted in the early phases of an outbreak, when public health authorities are appropriately concentrating on crisis response and not on rigorous study design, are challenging. A common problem is that disease severity of the cases recorded in a surveillance database will differ, perhaps substantially, from that of all cases in the population. This issue has arisen in the present epidemic of Ebola virus disease in West Africa and in many previous outbreaks and epidemics [4–9] and will continue to arise in future ones.
Here we outline two biases that may occur when estimating the CFR in a population from a surveillance database, and three more biases that may occur when comparing the CFR between subgroups to estimate the causal effect of medical interventions. We also briefly consider the applicability of these biases to a different application: comparing the CFR across different groups of people, for example, by geography, sex, age, comorbidities, and other “unchangeable” risk factors. Such factors are “unchangeable” in the sense that they are not candidates for intervention in the setting of the outbreak, though some could, of course, change over longer timescales. The goal of estimating the CFR in groups defined by such unchangeable factors is not to understand the causal role of these factors in mortality, but to develop a predictive model for mortality that might be used to improve prognostic accuracy or identify disparities. Such predictions may be affected by survivorship bias and selection bias, but not by confounding, as we discuss.
Biases Affecting the Estimation of the Overall CFR
Two biases that may affect the estimation of an overall CFR are presented in Table 1:
Preferential ascertainment of severe cases
For diseases that have a spectrum of clinical presentation, those cases that come to the attention of public health authorities and are entered into surveillance databases will typically be people with the most severe symptoms, who seek medical care, are admitted to hospital, or die. Therefore, the CFR will typically be higher among detected cases than among the entire population of cases, given that the latter may include individuals with mild, subclinical, and (under some definitions of “case”) asymptomatic presentations. Laboratory confirmation as an inclusion criterion may reduce this bias if it is able to detect a wider spectrum of presentations, or may exacerbate it if the probability of receiving a laboratory test is higher for more severe cases and/or if test sensitivity is higher for more severe cases. The magnitude of this bias may be uncertain for a long period because the spectrum of clinical presentations is itself uncertain at the start of an outbreak of a new disease [12,26]. All proposed approaches to estimate and correct for this bias (Table 1) require auxiliary data sources to estimate how the reported subset of cases compares with the overall population of cases. The availability of such auxiliary data sources will depend on the context of the outbreak.
Bias due to delayed reporting of death
During an ongoing epidemic, there is a delay between the time someone dies and the time their death is reported. Therefore, at any moment in time, the list of cases includes people who will die and whose death has not yet occurred, or has occurred but not yet been reported. Thus dividing the cumulative number of reported deaths by the cumulative number of reported cases at any moment will underestimate the true CFR. The key determinants of the magnitude of the bias are the epidemic growth rate and the distribution of delays from case-reporting to death-reporting; the longer the delays and the faster the growth rate, the greater the bias. Heuristically, the underestimate will be proportionate to the expansion of the epidemic during the delay between the time a case enters the database to the time the death of that case enters the database (if it occurs). Fig 1 illustrates an example where the delay is 3 weeks, the epidemic doubling time is 2 weeks, and the underestimate is by a factor of 23/2 ≈ 2.8.
In an ongoing epidemic, there will typically be a delay between the reporting of a case and the reporting of the death of that case, if the infected person dies. Thus, at any moment, there will be some cases reported who will die of the infection but who have not yet died, or whose deaths have not yet been reported. Simple division of the number of deaths reported by week w (green), by the number of cases reported by week w (blue) will underestimate the CFR because the numerator does not include all those cases in the denominator who will eventually die. With a reporting delay of 3 weeks for deaths compared to cases, the reported deaths curve will be shifted 3 weeks to the right, relative to the curve of the total number of cases reported by week w who will die (red). If the epidemic d