On the formalism of the screening paradox

Bayes’ Theorem imposes inevitable limitations on the accuracy of screening tests by tying the test’s predictive value to the disease prevalence. The aforementioned limitation is independent of the adequacy and make-up of the test and thus implies inherent Bayesian limitations to the screening process itself. As per the WHO’s Wilson − Jungner criteria, one of the prerequisite steps before undertaking screening is to ensure that a treatment for the condition screened for exists. However, when applying screening programs in closed systems, a paradox, henceforth termed the “screening paradox”, ensues. If a disease process is screened for and subsequently treated, its prevalence would drop in the population, which as per Bayes’ theorem, would make the tests’ predictive value drop in return. Put another way, a very powerful screening test would, by performing and succeeding at the very task it was developed to do, paradoxically reduce its ability to correctly identify individuals with the disease it screens for in the future—over some time t. In this manuscript, we explore the mathematical model which formalizes said screening paradox and explore its implications for population level screening programs. In particular, we define the number of positive test iterations (PTI) needed to reverse the effects of the paradox. Given their theoretical nature, clinical application of the concepts herein reported need validation prior to implementation. Meanwhile, an understanding of how the dynamics of prevalence can affect the PPV over time can help inform clinicians as to the reliability of a screening test’s results.


Bayes' Theorem and predictive values
Bayes' Theorem describes the probability of an event occurring based on prior knowledge of conditions related to that specific event [1]. The essence of the Bayesian approach is to provide a mathematical model explaining how existing beliefs change in light of new evidence [2]. Remarkably, Bayes theorem' has applications in innumerable fields. Not surprisingly, it has significant implications in epidemiological modelling as well. From Bayes' theorem, we can derive the positive predictive value ρ(ϕ) (PPV) of a screening test, defined as the percentage of patients with a positive screening test that do in fact have the disease screened for, as follows [1]: where ρ(ϕ) = PPV, a = sensitivity, b = specificity and ϕ = prevalence. The PPV ρ(ϕ) is therefore a function of the disease prevalence, ϕ. As the prevalence increases, ρ(ϕ) also increases and vice-versa [3].

Bayesian dynamics of predictive values
Let us define a hypothetical disease present in a population. Said condition has a preclinical phase and is amenable to screening through a given test designed to detect it. The test therefore has all of the pertinent screening parameters-sensitivity, specificity, and negative and positive predictive values. Finally, as required by the WHO's Wilson − Jungner criteria [4], a treatment for the condition screened exists. Let us denote ϕ 0 as the original or initial prevalence before screening is undertaken. As per Eq (1), we thus obtain a positive predictive value ρ(ϕ 0 ) at t 0 : Assuming no new cases, it follows that as the individual with a positive screening test is treated, the disease prevalence ϕ 0 drops by some magnitude k, which represents the percentage reduction in prevalence. Consequently, the predictive value ρ(ϕ) will drop by some factor as well, so that individuals who test positive at some time t k > 0 experience a positive predictive value of: From the above equations, we define the ratio z as follows: where 0 < z < 1 and k > 0. Given the shape of the screening curve, and the principle of the prevalence threshold, even small changes in the prevalence ϕ can have significant changes in the positive predictive value ρ(ϕ). To determine the degree of reduction in the predictive value of the screening test at time t k , we take the ratio of ρ(ϕ) at two different times, be it t 0 , and some later time t k with a prevalence reduction of ϕ 0 − k, where k < ϕ is the percentage reduction in prevalence: Since ϕ 0 − k yields a new, lower prevalence ϕ k , we can re-write the above equation as: Expanding the parentheses and simplifying the expression where the sum of the sensitivity a and specificity b is defined by ε = a + b, we obtain: The term ε − 1 = a + b − 1 has been previously defined in the context of receiver-operating characteristics (ROC) curves, and is termed the Youden's J statistic [5]. As such, we can re-write the above equation as: From the above relationship, we infer: z(ϕ 0 , k) may be considered as the predictive value percentage loss as the prevalence decreases from ϕ 0 to ϕ k . If we consider both ϕ 0 and k as independent variables, both of which affect ρ(ϕ), we can establish the individual contributions of each variable towards ρ(ϕ) relative to each other through the partial differential equation z(ϕ 0 , k). To apply this equation in cases where the prevalence increases over time, simply flip the terms in the fraction to obtain the complement of z(ϕ 0 , k).

The prevalence threshold (ϕ e )
We previously defined the prevalence threshold as the prevalence level in the screening curve below which screening tests drops most precipitously [1]. In technical terms, this is equivalent to the inflection point in the screening curve below which the the rate of change of a test's positive predictive value drops at a differential pace relative to the prevalence [1]. This value, termed ϕ e , is defined at the following point on the pre-test probability, or prevalence, axis: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The corresponding positive predictive value is given by plotting the above equation into the positive predictive value equation (Eq 3): rð� e Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi a 1 À b r ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi aðÀ b þ 1Þ Interestingly, the above expression leads to the well known formulation for the positive predictive value as a function of prevalence and the positive likelihood ratio (LR+), defined as the sensitivity a over the compliment of the specificity b [6].

Using radical conjugates to obtain the Youden's J-independent equation for ϕ e
We can use radical conjugates to further simplify the prevalence threshold equation. Let c = 1-b, the complement of the specificity, otherwise known as the fall-out or false positive rate (FPR). The ϕ e equation thus becomes: Multiplying by its radical conjugate, we obtain: The square difference in the numerator yields: Factoring out c, and knowing that ffi ffi x p x is equal to 1 ffi ffi x p we obtain: Factoring out ffi ffi ffi ffi ffi ac p in the denominator's first terms leads to: Finally, replacing 1 by c/c and factoring out the ensuing a-c term, we obtain: And thus, replacing c by 1-b the simplified version of the equation follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi From the above relationship, we can take a hypothetical scenario and identify three different points in the screening curve, notably ϕ 0 -the initial prevalence, ϕ k -the prevalence at subsequent time k, and ϕ e -the prevalence threshold (Fig 1).

Relationship between ϕ 0 , ϕ k , and ϕ e
We can deduce important relationships between ϕ 0 , ϕ k , and ϕ e that contextualize the screening paradox. First, we observe that though by definition ϕ 0 > ϕ k , ϕ e can either be outside or in between ϕ 0 and ϕ k . As such three different scenarios may arise. Herein we explore each.

Second scenario: ϕ 0 > ϕ k > ϕ e
This scenario is akin to the first scenario in that the prevalence threshold lies outside the range between ϕ 0 and ϕ k . However, an important difference arises. While lim k!0 z(ϕ) � 1, by design ϕ k > ϕ e , and thus the maximum value that k can take cannot be greater than ϕ 0 − ϕ e , as ϕ k ! ϕ e . We thus infer that: The above relationships follow since for ϕ > ϕ e ! dz/dϕ � 0, as per Eq (11).

Third scenario:
Perhaps the most interesting scenario is one where ϕ 0 > ϕ e > ϕ k . By design ϕ 0 > ϕ k . Let ϕ e define the prevalence threshold such that ϕ 0 > ϕ e > ϕ k . It then follows that ϕ 0 − ϕ k > ϕ 0 − ϕ e and thus 0 > ϕ k − ϕ e . Since ϕ k = ϕ 0 − k, we obtain 0 > ϕ 0 − k − ϕ e and thus: We thus infer that: In other words, when the initial prevalence lies beyond the prevalence threshold, changes in prevalence such that k approaches the difference between ϕ 0 − ϕ e , the ratio of positive predictive values as determined by the z(ϕ) function approaches 1. However, we can theorize a case where k is sufficiently large so that ϕ k goes well below ϕ e and therefore:

The screening paradox at the population-level
The mechanism by which the screening paradox arises is depicted through the following arrow flow diagram (Fig 2) [7][8][9]. Given the screening paradox, an increase in screening eventually leads to less, or more accurately, lower quality screening as the prevalence drops below the treshold. This paradox is inherently insurmountable unless acted upon by a subsequent test-either the same test repeated serially or an altogether different, better, diagnostic test [10] The calculation of such cumulative PPV is a form of Bayesian updating and can be explored in previous work by the author [10]. The salient points of that theory are further discussed next.

Overcoming the screening paradox
As the prevalence in a population drops with successful population-level screening and treatment, the positive predictive value of the screening test drops, and the false discovery rate, which is equivalent to the complement of the positive predictive value, increases. The aforementioned paradox occurs any time that disease is successfully treated because while dρ/dϕ drops throughout the function's domain, it never reaches 0. In other words, the positive predictive value function always increases throughout its domain, so even minute changes in prevalence will bring about changes in the positive predictive value. That said, as we described above, the critical factor is where lie the initial prevalence level ϕ 0 , the subsequent prevalence level ϕ k , their difference k, and how they relate to the prevalence threshold, ϕ e , below which the screening paradox becomes more pronounced. In the presence of a screening paradox it is worth considering potential solutions to overcoming the losses in predictive value as the prevalence drops. Though many options exist, the most logical step would be to undertake serial testing, be it with the same test undertaken serially or an alternative test altogether. Herein we explore both scenarios.

Repeated testing with a single test
We have shown in previous work that a screening test carried out serially improves the overall positive predictive value when each individual test iteration is positive [10]. The number of serial iterations n i required to achieve a desired ρ(ϕ) is given by the following ceiling function: The key question then becomes, how many serial positive tests are needed to mitigate or reverse the effect of the screening paradox when ϕ k < ϕ e ? In other words, to achieve a positive predictive value comparable to that under ϕ 0 ? Given the geometry of the screening curve, the answer should be that the PPV ought to at least attain the level at the prevalence threshold as described in the third scenario in section 4 of this manuscript so that z(ϕ) � 1. We can calculate the number of iterations n i� e needed using the formula above, by plugging ρ(ϕ e ) into ρ as defined in Eq (11) The above expression can be simplified by considering the square root of the positive likelihood ratio ffi ffi ffi ffi ffiffi as ω, and ϕ k is the prevalence at subsequent time k such that: We take the ceiling function of the above equation to ensure that we obtain an integer number of of positive test iterations (PTI) needed to surpass the prevalence threshold [10].

Using a different screening test
We can likewise revert the effects of the screening paradox by using a different screening test all together, which is the most common scenario in clinical practice today. That said, it would be impractical to determine the number of iterations of a different test for numerous reasons. First, different tests would have different sensitivity/specificity parameters, so a third test may be then needed in the rare scenario where two different positive ones are insufficient-rendering the notion of iteration inadequate. Likewise, and perhaps more importantly, there may not be an alternative screening test for a particular condition altogether, so the above exercise may be moot.

An SIR model without vital dynamics
To better illustrate the ideas above, we can take an infectious disease we shall call X for simplicity's sake. The condition need not be infectious in nature, but an infectious agent lends itself well to the application of the concepts herein described. To estimate how the prevalence of disease X changes over time in a community outbreak closed system, we can come up with a theoretical SIR model [11]. An SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. The name of this class of models derives from the fact that they involve coupled ordinary differential equations relating the number of susceptible people S(t), number of people infected I(t), and number of people who have recovered R(t) over time t [12]. One of the simplest SIR models is the Kermack-McKendrick model [13]. The dynamics of an infectious epidemic, such as the case of X, are often much faster than the dynamics of birth and death due to other causes than X, therefore, birth and death are often omitted in simple compartmental models. Otherwise stated-the population remains relatively stable over time t so that the individual parameters can change, but the total number of individual remains stable, as follows: where N is some constant.
The SIR system can be expressed by the following set of ordinary differential equations: where β is the average infection rate, γ is the recovery rate, S is the proportion of susceptible population, I is the proportion of infected, R is the proportion of removed population (either by death or recovery), and N is the sum of the latter three. From the above equations we obtain the basic reproduction number (R 0 ) as a ratio of infection-to-recovery rates [14]: The ensuing equation to determine the number of susceptible individuals as a function of time becomes: where S(0)and R(0) are the initial numbers of, susceptible and recovered/dead subjects, respectively. The graphical representation of the above dynamics over some time t is seen in Fig 3. For the purpose of the screening paradox, we need only focus on Eq 34, the rate of change of active infections dI/dt, which as a rate reflects the incidence of disease, but as an absolute value in a specific time t yields the prevalence of disease X at that point in time (Fig 4). We can use this value to determine how the PPV fluctuates over time. The differential equation relating the changes in PPV over time t therefore becomes: Towards this end, let us assume that the condition X has a screening test with excellent sensitivity and specificity parameters of 95 and 99 percent, respectively. According to Eq 10, this test would have a prevalence threshold of 9.3 percent. Using this threshold, we can illustrate the full SIR model as follows (Fig 4).
Note the rise of the PPV (purple) as a function of prevalence (red). The delineation of the prevalence threshold (PT) at week 10 shows a corresponding flattening of the PPV, which holds steady almost horizontally. We thus observe that as the prevalence crosses the PT, the test performs well, with greater than 90 percent predictive value. However, once the prevalence drops below the PT once again, around week 19, the PPV begins to drop anew. Of note, this is a consequence of the success of the screening test in the first place-leading to the accurate detection of disease in a higher proportion of individuals once the prevalence threshold has been crossed and people being adequately treated or quarantined to prevent further spread. In other words-as stated before-by performing and succeeding at the very task it was developed to do, a screening test paradoxically reduces its predictive ability to correctly identify individuals with the disease it screens for in the future. The degree to which this paradoxical effect is observed depends on where we set our original prevalence, ϕ 0 , since z(ϕ) depends on both ϕ 0 and k.
Likewise, we can use the SIR model to come demonstrate the dynamics of the epidemic of disease X in real-time, numerically, as observed in Table 1. The disease manifests over a number of weeks, affecting a peak 52 percent of this population by week 14. Because implicit to the paradox is the fact that ϕ 0 > ϕ k , let us for argument's sake take the maximum prevalence as the starting prevalence point ϕ 0 -though this need not be necessarily the case since the principles described in this work apply regardless of ϕ 0 . The time k thus corresponds to ϕ k each subsequent week. The corresponding PPV and the ensuing z(ϕ 0 , k) values can be seen in Table 1. Note that since our test has a sensitivity of 0.95 and a specificity of 0.99, Youden's J statistic equals 0.95+0.99−1 = 0.94. Finally, we take the ceiling function n iteration number to ensure that we obtain an integer number of positive test iterations (PTI) needed to surpass the prevalence threshold-thus enhancing the reliability of the screening process. Note that at the Model for a screening test with 95% sensitivity and 99% specificity over time t. PPV = positive predictive value, z(ϕ) is the PPV ratio between ϕ 0 and ϕ k , N = number of iterations to overcome the screening paradox (S1 Data) https://doi.org/10.1371/journal.pone.0256645.t001 extremes of prevalence we would need to obtain 3 serial positive tests to achieve a PPV similar to that beyond the prevalence threshold. Once that threshold has been crossed, by definition n i� e ¼ 1. As noted above, other than developing newer, better screening tests, serial testing is one way to overcome the screening paradox [15]-be it with the same test done repeatedly or using a second, different diagnostic test altogether.

Conclusion
In this manuscript, we explore the mathematical model which formalizes the screening paradox and explore its implications for population level screening programs in the three possible scenarios-each as a function of the position of the initial prevalence of a condition relative to the prevalence threshold level of its screening test. Likewise, we provide a mathematical model to determine the predictive value percentage loss as the prevalence decreases and define the number of positive test iterations (PTI) needed to reverse the effects of the paradox when a single test is undertaken serially. Given their theoretical nature, clinical application of the concepts herein reported need validation prior to implementation. Meanwhile, an understanding of how the dynamics of prevalence can affect the PPV over time can help inform clinicians as to the reliability of a screening test's results.
Supporting information S1 Data. Supplemental file: Author Contributions