Exposure misclassification bias in the estimation of vaccine effectiveness

In epidemiology, a typical measure of interest is the risk of disease conditional upon exposure. A common source of bias in the estimation of risks and risk ratios is misclassification. Exposure misclassification affects the measurement of exposure, i.e. the variable one conditions on. This article explains how to assess biases under non-differential exposure misclassification when estimating vaccine effectiveness, i.e. the vaccine-induced relative reduction in the risk of disease. The problem can be described in terms of three binary variables: the unobserved true exposure status, the observed but potentially misclassified exposure status, and the observed true disease status. The bias due to exposure misclassification is quantified by the difference between the naïve estimand defined as one minus the risk ratio comparing individuals observed as vaccinated with individuals observed as unvaccinated, and the vaccine effectiveness defined as one minus the risk ratio comparing truly vaccinated with truly unvaccinated. The magnitude of the bias depends on five factors: the risks of disease in the truly vaccinated and the truly unvaccinated, the sensitivity and specificity of exposure measurement, and vaccination coverage. Non-differential exposure misclassification bias is always negative. In practice, if the sensitivity and specificity are known or estimable from external sources, the true risks and the vaccination coverage can be estimated from the observed data and, thus, the estimation of vaccine effectiveness based on the observed risks can be corrected for exposure misclassification. When analysing risks under misclassification, careful consideration of conditional probabilities is crucial.


Introduction
Measurement error and misclassification result into information bias, i.e. a systematic error in the estimator of an exposure's effect on an outcome [1]. If the probability of misclassification in one variable does not depend on the level of other variables, misclassification is said to be non-differential. When estimating vaccine effectiveness, the exposure and outcome of interest are vaccination and occurrence of a given vaccine-preventable disease, respectively. The aim of measuring vaccine effectiveness is to quantify the relative reduction in the disease's risk or rate attributable to vaccination [2,3].
Under the assumption that the disease is rare or that the vaccine offers complete protection to a subset of the vaccinated individuals while leaving the rest unaffected, vaccine effectiveness can be estimated as one minus the risk ratio [2]. The two (cumulative) risks that are compared by the risk ratio are the two conditional probabilities of the disease in those who have been vaccinated and those who have not. While outcome misclassification affects the measurement of outcome conditional on exposure, exposure misclassification affects the measurement of exposure, i.e. the variable one should condition on.
De Smedt et al. [4] studied non-differential outcome and exposure misclassification bias in the estimation of vaccine effectiveness based on the risk ratio. Regarding outcome misclassification, their analysis is complete and in line with previous literature [5,6]. Their mathematical presentation of the bias caused by exposure misclassification is, however, not complete. We here provide a detailed derivation of the bias, emphasising the need for proper conditioning on a potentially misclassified exposure status. Finally, we show how the estimation of vaccine effectiveness can be adjusted for exposure misclassification if the sensitivity and specificity of exposure measurement are known or estimable from external sources. We use the registerbased estimation of influenza vaccine effectiveness in the Finnish elderly as an example [7].

Methods
A person is considered either vaccinated if a vaccine under study has been administered or unvaccinated if the vaccine has not been administered. We refer to this classification as the true vaccination status. In a study, however, the collected vaccination information might be subject to underreporting, omission, coding error or inaccurate recall, so a person's true vaccination status is unknown to the investigator. We therefore refer to the classification of individuals into vaccinated and unvaccinated based on study data as the observed but potentially misclassified vaccination status.
The problem of studying vaccine effectiveness under exposure misclassification can be described in terms of three binary variables: the unobserved true vaccination (exposure) status (V), the observed but potentially misclassified vaccination (exposure) status (Ṽ ), and the observed true disease (outcome) status (D). The association of interest is V!D, i.e. the risk (probability) of disease conditional on the true vaccination status P(D = 1|V). However, from observed data one can infer directly only the associa-tionṼ ! D, i.e. the risk of disease conditional on the observed vaccination status PðD ¼ 1jṼ Þ.
Assuming the absence of other sources of bias, it follows from the dependencies depicted in Let γ = P(V = 1) be the true vaccination coverage, i.e. the proportion of truly vaccinated individuals (Table 1). Exposure misclassification is quantified by the sensitivity and specificity of exposure measurement. The sensitivity is the probability SE ¼ PðṼ ¼ 1jV ¼ 1Þ of measuring correctly the vaccination status of a vaccinated individual. Correspondingly, the specificity is the probability SP ¼ PðṼ ¼ 0jV ¼ 0Þ of measuring correctly the vaccination status of an unvaccinated individual. The risks of disease in the truly vaccinated and the truly unvaccinated are π 1 = P(D = 1|V = 1) and π 0 = P(D = 1|V = 0), respectively. The effect measure of interest is the vaccine effectiveness defined by the estimand VE = 1−π 1 /π 0 (Table 1).
Since the true vaccination status and, therefore, γ, π 1 and π 0 are unobserved, we can only measure the observed vaccination coverage, PðṼ ¼ 1Þ, and the risks of disease conditionally upon the observed but potentially misclassified vaccination status, Table 1).
It follows from the conditional independence of D andṼ given V, as expressed in Eq (1), that The observed risks p 1 and p 0 can thus be interpreted as weighted averages of the true risks π 1 and π 0 . The four weights, each giving the probability of the true vaccination status conditioned upon the observed vaccination status, can be expressed in terms of three parameters, γ, Table 1. Notation.

Parameter
Symbol Explanation Risk in the truly vaccinated Risk in the truly unvaccinated and If both the sensitivity SE�0.5 and specificity SP�0.5, i.e. if the observed vaccination status V ¼ 1 is equally likely in truly vaccinated and unvaccinated individuals, the observed risks p 1 and p 0 are approximately identical (p 1 �p 0 ) and the vaccine effectiveness is not identifiable from the data. We therefore assume hereafter that SE and SP are away from 0.5.
Solving Eqs (2B) and (3B) for π 1 and π 0 , we obtain The estimand VE follows as This expression still involves the unobserved true vaccination coverage γ. However, by solving PðṼ ¼ 1Þ ¼ SE � g þ ð1 À SPÞ � ð1 À gÞ for γ, γ can be expressed in terms of the observed vaccination coverage PðṼ ¼ 1Þ, sensitivity SE and specificity SP: Since VE thus depends only on parameters that can be estimated from the observed data (PðṼ ¼ 1Þ; p 1 ; p 0 ) or might be known or estimable from external sources (SE, SP), the vaccine effectiveness can be estimated even under exposure misclassification. In the absence of exposure misclassification (SE = SP = 1), Eq (6) simplifies to the standard expression of vaccine effectiveness based on observed risks. Under exposure misclassification, however, this naïve estimand, (1−p 1 /p 0 ), differs from the correct estimand, VE. The difference Δ of the two estimands quantifies the bias due exposure misclassification: À SPÞ � ð1 À gÞ SE � g þ ð1 À SPÞ � ð1 À gÞ |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } X : ð8Þ The last term, X, is due to the denominators in Eqs (2B) and (3B) and results from correct conditioning on the observed vaccination status. Of note, De Smedt et al. [4] expressed the observed risks p 1 and p 0 omitting these denominators and their mathematical presentation of bias Δ thus misses term X.
In addition, we demonstrate how the estimation of vaccine effectiveness can be adjusted for exposure misclassification taking the register-based estimation of influenza vaccine effectiveness in the Finnish elderly (aged 65 years and above) in 2016/17 as an example [7]. All individuals in the study population are classified as vaccinated or unvaccinated based on their records in the Finnish vaccination register giving the observed and potentially misclassified vaccination statusṼ . Assuming the register's specificity to be perfect, its sensitivity SE is first evaluated as the ratio of the observed vaccination coverage in the subpopulation of individuals aged 70 to 79 years and the corresponding vaccination coverage in a representative survey [8], which we assume to reflect the true vaccination coverage in that subpopulation. We then use Eq (7) to calculate the true vaccination coverage γ for the whole study population and Eq (6) to assess the vaccine effectiveness using the values of γ, SE and SP = 1, as well as the observed influenza risks p 1 and p 0 .

Results
Five parameters determine the magnitude of exposure misclassification bias: the risks of the disease in the truly vaccinated (π 1 ) and the truly unvaccinated (π 0 ), the sensitivity (SE), the specificity (SP) and the true vaccination coverage (γ). The last term, X, in Eq (8) arising from proper conditioning depends on three parameters: true vaccination coverage, sensitivity and specificity. Because X describes a ratio of two probabilities, its magnitude ranges from 0 to infinity. Fig 2 presents the effect size measured by the estimands (1−p 1 /p 0 ) and VE under non-differential exposure misclassification. While VE accurately measures the vaccine effectiveness at 0.7, (1−p 1 /p 0 ) does not quantify the vaccine effectiveness correctly unless both sensitivity and specificity are perfect. The vertical distance between the two effect measures depicted in Fig 2  marks the magnitude of the bias. The bias is always negative. As imperfect sensitivity leads to misclassification of the truly vaccinated, its impact is strongest when the true vaccination coverage is high. Vice versa, the impact of imperfect specificity, which leads to misclassification of the truly unvaccinated, is strongest when the true vaccination coverage is low.
In the example from the 2016/17 influenza season, the vaccination coverage in elderly aged 70 to 79 years was 51% according to the register [9] and 64% according to the survey [8]. The sensitivity (SE) of the register-based exposure measurement was thus estimated at 80% (= 51%/64%�100%). The observed vaccination coverage in the study population, PðṼ ¼ 1Þ, was 47% and the estimated cumulative risks (p 1 and p 0 ) were 16% and 20%, respectively [7]. It follows from Eqs (7) and (6) that the true vaccination coverage (γ) was 59% and the vaccine effectiveness (VE) was 24%. The vaccine effectiveness was thus 4 percentage points higher than the naïve estimate, (1−p 1 /p 0 ), would have suggested.

Discussion
In this article, we derived an expression for the magnitude of non-differential exposure misclassification bias in the estimation of vaccine effectiveness based on risk ratios. The bias depends on five factors: true vaccination coverage, sensitivity and specificity of exposure measurement, and the risk of the disease of interest in the truly vaccinated and the truly unvaccinated. If the sensitivity and specificity are known or estimable from external sources, Eq (6) can be used to correct the estimation of vaccine effectiveness based on the observed risks for exposure misclassification. In absence of exact information about the sensitivity and specificity of exposure measurement, Eq (8) can be used for assessing the potential magnitude of bias given a range of plausible values. Our findings are in line with previous literature. In contrast to non-differential outcome misclassification, non-differential exposure misclassification leads to bias [4]. The result that this bias is always negative must not be misinterpreted as a generic rule that any naïve estimate derived under exposure misclassification would be an underestimate of vaccine effectiveness. Due to random error, a single naïve estimate can under-or overestimate the true effect [10,11]. This random error remains even after adjusting for the bias as described in this paper. Which of the two errors dominates depends on the true values of the underlying parameters.
We described the problem of studying vaccine effectiveness under exposure misclassification and absence of other sources of bias in terms of three binary variables and formulated the joint probabilities of the three model quantities. Our exposure misclassification model (Fig 1) is a special case of the model presented by Tang et al. [12] for misclassification in both exposure and outcome. This approach of jointly modelling all observables and conditioning on the actually observed variables facilitates the connection between conditional probabilities and standard parameters such as sensitivity, specificity and risks, avoiding fallacies like the one pointed out in this paper. Moreover, if exposure misclassification should be differential, it can be easily incorporated in the initial model by allowing dependence of the sensitivity and specificity on the disease status or other variables.
This article pointed out the importance of proper conditioning. If misclassification in the exposure status is not accurately taken into account, expressions quantifying the magnitude of bias will be wrong. Based on the correct equations, a potentially biased vaccine effectiveness (or more generally any risk ratio) estimate can be adjusted for exposure misclassification. Nevertheless, this requires that the sensitivity and specificity of exposure measurement are known or are estimable from external sources.