Figures
Abstract
Bayes’ Theorem imposes inevitable limitations on the accuracy of screening tests by tying the test’s predictive value to the disease prevalence. The aforementioned limitation is independent of the adequacy and make-up of the test and thus implies inherent Bayesian limitations to the screening process itself. As per the WHO’s Wilson − Jungner criteria, one of the prerequisite steps before undertaking screening is to ensure that a treatment for the condition screened for exists. However, when applying screening programs in closed systems, a paradox, henceforth termed the “screening paradox”, ensues. If a disease process is screened for and subsequently treated, its prevalence would drop in the population, which as per Bayes’ theorem, would make the tests’ predictive value drop in return. Put another way, a very powerful screening test would, by performing and succeeding at the very task it was developed to do, paradoxically reduce its ability to correctly identify individuals with the disease it screens for in the future—over some time t. In this manuscript, we explore the mathematical model which formalizes said screening paradox and explore its implications for population level screening programs. In particular, we define the number of positive test iterations (PTI) needed to reverse the effects of the paradox. Given their theoretical nature, clinical application of the concepts herein reported need validation prior to implementation. Meanwhile, an understanding of how the dynamics of prevalence can affect the PPV over time can help inform clinicians as to the reliability of a screening test’s results.
Citation: Balayla J (2021) On the formalism of the screening paradox. PLoS ONE 16(9): e0256645. https://doi.org/10.1371/journal.pone.0256645
Editor: Alan D. Hutson, Roswell Park Cancer Institute, UNITED STATES
Received: November 27, 2020; Accepted: April 29, 2021; Published: September 1, 2021
Copyright: © 2021 Jacques Balayla. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
1.1 Bayes’ Theorem and predictive values
Bayes’ Theorem describes the probability of an event occurring based on prior knowledge of conditions related to that specific event [1]. The essence of the Bayesian approach is to provide a mathematical model explaining how existing beliefs change in light of new evidence [2]. Remarkably, Bayes theorem’ has applications in innumerable fields. Not surprisingly, it has significant implications in epidemiological modelling as well. From Bayes’ theorem, we can derive the positive predictive value ρ(ϕ) (PPV) of a screening test, defined as the percentage of patients with a positive screening test that do in fact have the disease screened for, as follows [1]:
(1)
where ρ(ϕ) = PPV, a = sensitivity, b = specificity and ϕ = prevalence.
The PPV ρ(ϕ) is therefore a function of the disease prevalence, ϕ. As the prevalence increases, ρ(ϕ) also increases and vice-versa [3].
2 Bayesian dynamics of predictive values
Let us define a hypothetical disease present in a population. Said condition has a preclinical phase and is amenable to screening through a given test designed to detect it. The test therefore has all of the pertinent screening parameters—sensitivity, specificity, and negative and positive predictive values. Finally, as required by the WHO’s Wilson − Jungner criteria [4], a treatment for the condition screened exists. Let us denote ϕ0 as the original or initial prevalence before screening is undertaken. As per Eq (1), we thus obtain a positive predictive value ρ(ϕ0) at t0:
(2)
Assuming no new cases, it follows that as the individual with a positive screening test is treated, the disease prevalence ϕ0 drops by some magnitude k, which represents the percentage reduction in prevalence. Consequently, the predictive value ρ(ϕ) will drop by some factor as well, so that individuals who test positive at some time tk > 0 experience a positive predictive value of:
(3)
From the above equations, we define the ratio ζ as follows:
(4)
where 0 < ζ < 1 and k > 0. Given the shape of the screening curve, and the principle of the prevalence threshold, even small changes in the prevalence ϕ can have significant changes in the positive predictive value ρ(ϕ). To determine the degree of reduction in the predictive value of the screening test at time tk, we take the ratio of ρ(ϕ) at two different times, be it t0, and some later time tk with a prevalence reduction of ϕ0 − k, where k < ϕ is the percentage reduction in prevalence:
(5)
Since ϕ0 − k yields a new, lower prevalence ϕk, we can re-write the above equation as:
(6)
Expanding the parentheses and simplifying the expression where the sum of the sensitivity a and specificity b is defined by ε = a + b, we obtain:
(7)
The term ε − 1 = a + b − 1 has been previously defined in the context of receiver-operating characteristics (ROC) curves, and is termed the Youden’s J statistic [5]. As such, we can re-write the above equation as:
(8)
From the above relationship, we infer:
(9)
ζ(ϕ0, k) may be considered as the predictive value percentage loss as the prevalence decreases from ϕ0 to ϕk. If we consider both ϕ0 and k as independent variables, both of which affect ρ(ϕ), we can establish the individual contributions of each variable towards ρ(ϕ) relative to each other through the partial differential equation ζ(ϕ0, k). To apply this equation in cases where the prevalence increases over time, simply flip the terms in the fraction to obtain the complement of ζ(ϕ0, k).
3 The prevalence threshold (ϕe)
We previously defined the prevalence threshold as the prevalence level in the screening curve below which screening tests drops most precipitously [1]. In technical terms, this is equivalent to the inflection point in the screening curve below which the the rate of change of a test’s positive predictive value drops at a differential pace relative to the prevalence [1]. This value, termed ϕe, is defined at the following point on the pre-test probability, or prevalence, axis:
(10)
The corresponding positive predictive value is given by plotting the above equation into the positive predictive value equation (Eq 3):
(11)
Interestingly, the above expression leads to the well known formulation for the positive predictive value as a function of prevalence and the positive likelihood ratio (LR+), defined as the sensitivity a over the compliment of the specificity b [6].
(12)
3.1 Using radical conjugates to obtain the Youden’s J-independent equation for ϕe
We can use radical conjugates to further simplify the prevalence threshold equation. Let c = 1-b, the complement of the specificity, otherwise known as the fall-out or false positive rate (FPR). The ϕe equation thus becomes:
(13)
Multiplying by its radical conjugate, we obtain:
(14)
The square difference in the numerator yields:
(15)
Factoring out c, and knowing that is equal to
we obtain:
(16)
Factoring out in the denominator’s first terms leads to:
(17)
Finally, replacing 1 by c/c and factoring out the ensuing a-c term, we obtain:
(18)
And thus, replacing c by 1-b the simplified version of the equation follows:
(19)
From the above relationship, we can take a hypothetical scenario and identify three different points in the screening curve, notably ϕ0—the initial prevalence, ϕk—the prevalence at subsequent time k, and ϕe—the prevalence threshold (Fig 1).
4 Relationship between ϕ0, ϕk, and ϕe
We can deduce important relationships between ϕ0, ϕk, and ϕe that contextualize the screening paradox. First, we observe that though by definition ϕ0 > ϕk, ϕe can either be outside or in between ϕ0 and ϕk. As such three different scenarios may arise. Herein we explore each.
4.1 First scenario: ϕe > ϕ0 > ϕk
By design, ϕ0 > ϕk. Let ϕe define the prevalence threshold such that ϕe > ϕ0 > ϕk. It then follows that ϕe − ϕk > ϕ0 − ϕk and thus ϕe − ϕ0 > 0. Since ϕ0 = ϕk + k, we obtain ϕe − ϕk − k > 0 or otherwise stated, ϕe > ϕk + k and thus ϕe − ϕk > k. We thus infer that:
(20)
and
(21)
4.2 Second scenario: ϕ0 > ϕk > ϕe
This scenario is akin to the first scenario in that the prevalence threshold lies outside the range between ϕ0 and ϕk. However, an important difference arises. While limk→0 ζ(ϕ) ∼ 1, by design ϕk > ϕe, and thus the maximum value that k can take cannot be greater than ϕ0 − ϕe, as ϕk → ϕe. We thus infer that:
(22)
and
(23)
The above relationships follow since for ϕ > ϕe → dζ/dϕ ∼ 0, as per Eq (11).
4.3 Third scenario: ϕ0 > ϕe > ϕk
Perhaps the most interesting scenario is one where ϕ0 > ϕe > ϕk. By design ϕ0 > ϕk. Let ϕe define the prevalence threshold such that ϕ0 > ϕe > ϕk. It then follows that ϕ0 − ϕk > ϕ0 − ϕe and thus 0 > ϕk − ϕe. Since ϕk = ϕ0 − k, we obtain 0 > ϕ0 − k − ϕe and thus:
(24)
In other words, when the initial prevalence lies beyond the prevalence threshold, changes in prevalence such that k approaches the difference between ϕ0 − ϕe, the ratio of positive predictive values as determined by the ζ(ϕ) function approaches 1. However, we can theorize a case where k is sufficiently large so that ϕk goes well below ϕe and therefore:
(26)
5 The screening paradox at the population-level
The mechanism by which the screening paradox arises is depicted through the following arrow flow diagram (Fig 2) [7–9].
Given the screening paradox, an increase in screening eventually leads to less, or more accurately, lower quality screening as the prevalence drops below the treshold. This paradox is inherently insurmountable unless acted upon by a subsequent test—either the same test repeated serially or an altogether different, better, diagnostic test [10] The calculation of such cumulative PPV is a form of Bayesian updating and can be explored in previous work by the author [10]. The salient points of that theory are further discussed next.
6 Overcoming the screening paradox
As the prevalence in a population drops with successful population-level screening and treatment, the positive predictive value of the screening test drops, and the false discovery rate, which is equivalent to the complement of the positive predictive value, increases. The aforementioned paradox occurs any time that disease is successfully treated because while dρ/dϕ drops throughout the function’s domain, it never reaches 0. In other words, the positive predictive value function always increases throughout its domain, so even minute changes in prevalence will bring about changes in the positive predictive value. That said, as we described above, the critical factor is where lie the initial prevalence level ϕ0, the subsequent prevalence level ϕk, their difference k, and how they relate to the prevalence threshold, ϕe, below which the screening paradox becomes more pronounced. In the presence of a screening paradox it is worth considering potential solutions to overcoming the losses in predictive value as the prevalence drops. Though many options exist, the most logical step would be to undertake serial testing, be it with the same test undertaken serially or an alternative test altogether. Herein we explore both scenarios.
6.1 Repeated testing with a single test
We have shown in previous work that a screening test carried out serially improves the overall positive predictive value when each individual test iteration is positive [10]. The number of serial iterations ni required to achieve a desired ρ(ϕ) is given by the following ceiling function:
(27)
The key question then becomes, how many serial positive tests are needed to mitigate or reverse the effect of the screening paradox when ϕk < ϕe? In other words, to achieve a positive predictive value comparable to that under ϕ0? Given the geometry of the screening curve, the answer should be that the PPV ought to at least attain the level at the prevalence threshold as described in the third scenario in section 4 of this manuscript so that ζ(ϕ) ∼ 1. We can calculate the number of iterations needed using the formula above, by plugging ρ(ϕe) into ρ as defined in Eq (11), where
.
(28)
The above expression can be simplified by considering the square root of the positive likelihood ratio as ω, and ϕk is the prevalence at subsequent time k such that:
(29)
We take the ceiling function of the above equation to ensure that we obtain an integer number of of positive test iterations (PTI) needed to surpass the prevalence threshold [10].
6.2 Using a different screening test
We can likewise revert the effects of the screening paradox by using a different screening test all together, which is the most common scenario in clinical practice today. That said, it would be impractical to determine the number of iterations of a different test for numerous reasons. First, different tests would have different sensitivity/specificity parameters, so a third test may be then needed in the rare scenario where two different positive ones are insufficient—rendering the notion of iteration inadequate. Likewise, and perhaps more importantly, there may not be an alternative screening test for a particular condition altogether, so the above exercise may be moot.
7 An SIR model without vital dynamics
To better illustrate the ideas above, we can take an infectious disease we shall call X for simplicity’s sake. The condition need not be infectious in nature, but an infectious agent lends itself well to the application of the concepts herein described. To estimate how the prevalence of disease X changes over time in a community outbreak closed system, we can come up with a theoretical SIR model [11]. An SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. The name of this class of models derives from the fact that they involve coupled ordinary differential equations relating the number of susceptible people S(t), number of people infected I(t), and number of people who have recovered R(t) over time t [12]. One of the simplest SIR models is the Kermack-McKendrick model [13]. The dynamics of an infectious epidemic, such as the case of X, are often much faster than the dynamics of birth and death due to other causes than X, therefore, birth and death are often omitted in simple compartmental models. Otherwise stated—the population remains relatively stable over time t so that the individual parameters can change, but the total number of individual remains stable, as follows:
(30)
where N is some constant.
The SIR system can be expressed by the following set of ordinary differential equations:
(31)
(32)
(33)
where β is the average infection rate, γ is the recovery rate, S is the proportion of susceptible population, I is the proportion of infected, R is the proportion of removed population (either by death or recovery), and N is the sum of the latter three. From the above equations we obtain the basic reproduction number (R0) as a ratio of infection-to-recovery rates [14]:
(34)
The ensuing equation to determine the number of susceptible individuals as a function of time becomes:
(35)
where S(0)and R(0) are the initial numbers of, susceptible and recovered/dead subjects, respectively.
The graphical representation of the above dynamics over some time t is seen in Fig 3.
For the purpose of the screening paradox, we need only focus on Eq 34, the rate of change of active infections dI/dt, which as a rate reflects the incidence of disease, but as an absolute value in a specific time t yields the prevalence of disease X at that point in time (Fig 4). We can use this value to determine how the PPV fluctuates over time. The differential equation relating the changes in PPV over time t therefore becomes:
(36)
Towards this end, let us assume that the condition X has a screening test with excellent sensitivity and specificity parameters of 95 and 99 percent, respectively. According to Eq 10, this test would have a prevalence threshold of 9.3 percent. Using this threshold, we can illustrate the full SIR model as follows (Fig 4).
Note the rise of the PPV (purple) as a function of prevalence (red). The delineation of the prevalence threshold (PT) at week 10 shows a corresponding flattening of the PPV, which holds steady almost horizontally. We thus observe that as the prevalence crosses the PT, the test performs well, with greater than 90 percent predictive value. However, once the prevalence drops below the PT once again, around week 19, the PPV begins to drop anew. Of note, this is a consequence of the success of the screening test in the first place—leading to the accurate detection of disease in a higher proportion of individuals once the prevalence threshold has been crossed and people being adequately treated or quarantined to prevent further spread. In other words—as stated before—by performing and succeeding at the very task it was developed to do, a screening test paradoxically reduces its predictive ability to correctly identify individuals with the disease it screens for in the future. The degree to which this paradoxical effect is observed depends on where we set our original prevalence, ϕ0, since ζ(ϕ) depends on both ϕ0 and k.
Likewise, we can use the SIR model to come demonstrate the dynamics of the epidemic of disease X in real-time, numerically, as observed in Table 1. The disease manifests over a number of weeks, affecting a peak 52 percent of this population by week 14. Because implicit to the paradox is the fact that ϕ0 > ϕk, let us for argument’s sake take the maximum prevalence as the starting prevalence point ϕ0—though this need not be necessarily the case since the principles described in this work apply regardless of ϕ0. The time k thus corresponds to ϕk each subsequent week. The corresponding PPV and the ensuing ζ(ϕ0, k) values can be seen in Table 1. Note that since our test has a sensitivity of 0.95 and a specificity of 0.99, Youden’s J statistic equals 0.95+0.99−1 = 0.94. Finally, we take the ceiling function n iteration number to ensure that we obtain an integer number of positive test iterations (PTI) needed to surpass the prevalence threshold—thus enhancing the reliability of the screening process. Note that at the extremes of prevalence we would need to obtain 3 serial positive tests to achieve a PPV similar to that beyond the prevalence threshold. Once that threshold has been crossed, by definition . As noted above, other than developing newer, better screening tests, serial testing is one way to overcome the screening paradox [15]—be it with the same test done repeatedly or using a second, different diagnostic test altogether.
8 Conclusion
In this manuscript, we explore the mathematical model which formalizes the screening paradox and explore its implications for population level screening programs in the three possible scenarios—each as a function of the position of the initial prevalence of a condition relative to the prevalence threshold level of its screening test. Likewise, we provide a mathematical model to determine the predictive value percentage loss as the prevalence decreases and define the number of positive test iterations (PTI) needed to reverse the effects of the paradox when a single test is undertaken serially. Given their theoretical nature, clinical application of the concepts herein reported need validation prior to implementation. Meanwhile, an understanding of how the dynamics of prevalence can affect the PPV over time can help inform clinicians as to the reliability of a screening test’s results.
Supporting information
S1 Data. Supplemental file: Table data and graph.
https://doi.org/10.1371/journal.pone.0256645.s001
(XLSX)
References
- 1. Balayla Jacques. Prevalence threshold (ϕ e) and the geometry of screening curves. Plos one, 15(10):e0240215, 2020. pmid:33027310
- 2. Efron Bradley. Bayes’ theorem in the 21st century. Science, 340(6137):1177–1178, 2013. pmid:23744934
- 3. Brenner Hermann and Gefeller OLAF. Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Statistics in medicine, 16(9):981–991, 1997. pmid:9160493
- 4. Petros Michael. Revisiting the wilson-jungner criteria: how can supplemental criteria guide public health in the era of genetic screening? Genetics in Medicine, 14(1):129–134, 2012. pmid:22237442
- 5. Youden William J. Index for rating diagnostic tests. Cancer, 3(1):32–35, 1950. pmid:15405679
- 6. Streiner David L. Statistics commentary series: Commentary no. 36: Extending bayes’ theorem by using likelihood ratios. Journal of Clinical Psychopharmacology, 39(6):547–549, 2019.
- 7.
James Maxwell Glover Wilson, Gunnar Jungner, World Health Organization, et al. Principles and practice of screening for disease. 1968.
- 8. Baldessarini Ross J, Finklestein Seth, and Arana George W. The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of general psychiatry, 40(5):569–573, 1983. pmid:6838334
- 9.
Jay A Winsten. Competition in health care: Is consumer choice in the consumer’s interest?, 1981.
- 10.
Jacques Balayla. Bayesian updating and sequential testing: Overcoming inferential limitations of screening tests. arXiv preprint arXiv:2006.11641, 2020.
- 11.
Howard Howie Weiss. The sir model and the foundations of public health. Materials matematics, pages 0001–17, 2013.
- 12.
David Smith, Lang Moore, et al. The sir model for spread of disease: the differential equation model. Convergence, 2004.
- 13. Wang Haiyan and Wang Xiang-Sheng. Traveling wave phenomena in a kermack–mckendrick sir model. Journal of Dynamics and Differential Equations, 28(1):143–166, 2016.
- 14. Dietz Klaus. The estimation of the basic reproduction number for infectious diseases. Statistical methods in medical research, 2(1):23–41, 1993. pmid:8261248
- 15.
Jacques Balayla. Derivation of generalized equations for the predictivevalue of sequential screening tests. arXiv preprint arXiv:2007.13046, 2020.