When Should Potentially False Research Findings Be Considered Acceptable?

Summary Ioannidis estimated that most published research findings are false [1], but he did not indicate when, if at all, potentially false research results may be considered as acceptable to society. We combined our two previously published models [2,3] to calculate the probability above which research findings may become acceptable. A new model indicates that the probability above which research results should be accepted depends on the expected payback from the research (the benefits) and the inadvertent consequences (the harms). This probability may dramatically change depending on our willingness to tolerate error in accepting false research findings. Our acceptance of research findings changes as a function of what we call “acceptable regret,” i.e., our tolerance of making a wrong decision in accepting the research hypothesis. We illustrate our findings by providing a new framework for early stopping rules in clinical research (i.e., when should we accept early findings from a clinical trial indicating the benefits as true?). Obtaining absolute “truth” in research is impossible, and so society has to decide when less-than-perfect results may become acceptable.


Essay
February 2007 | Volume 4 | Issue 2 | e26 A s society pours more resources into medical research, it will increasingly realize that the research "payback" always represents a mixture of false and true fi ndings. This tradeoff is similar to the tradeoff seen with other societal investments-for example, economic development can lead to environmental harms while measures to increase national security can erode civil liberties. In most of the enterprises that defi ne modern society, we are willing to accept these tradeoffs.
In other words, there is a threshold (or likelihood) at which a particular policy becomes socially acceptable.
In the case of medical research, we can similarly try to defi ne a threshold by asking: "When should potentially false research fi ndings become acceptable to society?" In other words, at what probability are research fi ndings determined to be suffi ciently true and when should we be willing to accept the results of this research?

Defi ning the "Threshold Probability"
As in most investment strategies, our willingness to accept particular research fi ndings will depend on the expected payback (the benefi ts) and the inadvertent consequences (the harms) of the research. We begin by defi ning a "positive" fi nding in research in the same way that Ioannidis defi ned it [1]. A positive fi nding occurs when the claim for an alternative hypothesis (instead of the null hypothesis) can be accepted at a particular, pre-specifi ed statistical signifi cance. The probability that a research result is true (the posterior probability; PPV) depends on: (1) the probability of it being true before the study is undertaken (the prior probability), (2) the statistical power of the study, and (3) the statistical signifi cance of the research result. The PPV may also be infl uenced by bias [1,4], i.e., by systematic misrepresentation of the research due to inadequacies in the design, conduct, or analysis [1].
However, the calculation of PPV tells us nothing about whether a particular research result is acceptable to researchers or not. Nevertheless, it can be shown that there is some probability (the "threshold probability," p t ) above which the results of a study will be suffi cient for researchers to accept them as "true" [3]. The threshold probability will depend on the ratio of net benefi ts/harms (B/H) that is generated by the study [3,5,6]. Mathematically the relationship between p t and B/H can be expressed as (see Appendix, Equation A1): (1) We defi ne net benefi t as the difference between the values of the outcomes of the action taken under the research hypothesis and the null hypothesis, respectively (when in fact the research hypothesis is true). Net harms are defi ned as the difference between the values of the outcomes of the action taken under the null and the research hypotheses, respectively (when in fact the null hypothesis is true) [3]. It follows that if the PPV is above p t we can rationally accept the results of the research fi ndings. Similarly, if the PPV is below p t we should accept the null hypothesis. Note that the research payoffs (the benefi ts) and the inadvertent consequences (harms)

Summary
Ioannidis estimated that most published research fi ndings are false [1], but he did not indicate when, if at all, potentially false research results may be considered as acceptable to society. We combined our two previously published models [2,3] to calculate the probability above which research fi ndings may become acceptable. A new model indicates that the probability above which research results should be accepted depends on the expected payback from the research (the benefi ts) and the inadvertent consequences (the harms). This probability may dramatically change depending on our willingness to tolerate error in accepting false research fi ndings. Our acceptance of research fi ndings changes as a function of what we call "acceptable regret," i.e., our tolerance of making a wrong decision in accepting the research hypothesis. We illustrate our fi ndings by providing a new framework for early stopping rules in clinical research (i.e., when should we accept early fi ndings from a clinical trial indicating the benefi ts as true?). Obtaining absolute "truth" in research is impossible, and so society has to decide when less-than-perfect results may become acceptable.
The Essay section contains opinion pieces on topics of broad interest to a general medical audience.
in Equation 1 can be expressed in a variety of units. In clinical research these units would typically be length of life, morbidity or mortality rates, absence of pain, cost, and strength of individual or societal preference for a given outcome [3].
We can now frame the crucial question of interest as: What is the minimum B/H ratio for the given PPV for which the research hypothesis has a greater value than the null hypothesis? Mathematically, this will occur when (see Appendix, Equations A1 and A2): or (2) Calculation of the Threshold Probability of "Accepted Truth" Figure 1 shows the threshold probability of "truth" (i.e., the probability above which the research fi ndings may be accepted) as a function of B/H associated with the research results. The graph shows that as long as the probability of "accepted truth" (a horizontal line) is above the threshold probability curve, the research fi ndings may be accepted. The higher the B/H ratio, the less certain we need to be of the truthfulness of the research results in order to accept them.
Note that we are following the classic decision theory approach to the results of clinical trials, which states that a rational decision maker should select the research versus the null hypothesis depending on which one maximizes the value of consequences [7][8][9]. In the parlance of expected utility decision theory, this means that we should choose the option with the higher expected utility [3,5,[7][8][9][10][11][12]. (Expected utility is the average of all possible results weighted by their corresponding probabilities-see Appendix). In other words, the results of the research hypothesis should be accepted when the benefi t of the action outweigh its harms.

A Practical Example: When Should We Stop a Clinical Trial?
Interim analyses of clinical trials are challenging exercises in which researchers and/or data safety monitoring committees have to make a judgment as to whether to accept early promising results and terminate a trial or whether the trial should continue [13,14]. If the interim analysis shows signifi cant benefi t in effi cacy for the new treatment over the standard treatment, continuing to enroll patients into the trial may mean that many patients will receive the inferior standard treatment [13,14]. The fi rst randomized controlled trial of circumcision for preventing heterosexual transmission of HIV, for example, was terminated early after the interim analysis showed that circumcised men were less likely to be infected with HIV [15]. However, if a study is wrongly terminated for presumed benefi ts, this could result in adoption of a new therapy of questionable effi cacy [13,14].
The results indicate that in the best-case scenario, the probability that the research fi ndings are true far exceeds the threshold above which the results should be accepted (i.e., PPV is greater than p t ). Therefore, rationally, in this case we should not hesitate to accept the fi ndings from this study as truthful. However, in the worst-case scenario, the lower limit of the PPV's 95% confi dence interval intersects with the upper limit of the threshold's 95% confi dence interval, indicating that under these circumstances the research hypothesis may not be  The horizontal yellow line indicates the actual conditional probability that the research hypothesis is true in the case of positive fi ndings. This means that for benefi t/harm ratios above the threshold (1.5 in this example), the research hypothesis can be accepted.
acceptable (since PPV is possibly less than p t ). Had the investigators made a mistake when they terminated the trial early?

Dealing with Unavoidable Erroneous Research Findings
Mistakes are an integral part of research. Positive research fi ndings may subsequently be shown to be false [18]. When we accept that our initially positive research fi ndings were in fact false, we may discover that another alternative (i.e., the null hypothesis) would have been preferable [7,[19][20][21]. When an initially positive research fi nding turns out to be false, this may bring a sense of loss or regret [19,20,22,23]. However, abundant experience has shown that there are many situations in which we can tolerate wrong decisions, and others in which we cannot [2]. We have previously described the concept of acceptable regret, i.e., under certain conditions making a wrong decision will not be particularly burdensome to the decision maker [2].

Defi ning Tolerable Limits for Accepting Potentially False Results
We now apply the concept of acceptable regret to address the question of whether potentially false research fi ndings should be tolerated. In other words: which decision (regarding a research hypothesis) should we make if we want to ensure that the regret is less than a predetermined (minimal acceptable) regret, R 0 [2]? (R 0 denotes acceptable regret and should be expressed in the same units as benefi ts and harms).
It can easily be shown that we should be willing to accept the results of potentially false research fi ndings as long as the posterior probability of it being true is above the acceptable regret threshold probability, p r (see Equation 3, Appendix, and Equations A3 and A4): where r is the amount of acceptable regret expressed as a percentage of the benefi ts that we are willing to lose in case our decision proves to be the wrong one (i.e., ). This equation describes the effect of acceptable regret on the threshold probability (Equation 1) in such a way that the PPV now also needs to be above the threshold defi ned in Equation 3 for the research results to become acceptable.
Note that actions under expected utility theory (EUT) and acceptable regret may not necessary be identical, but arguably the most rational course of action would be to select those The Radiation Oncology Cooperative Group conducted a randomized controlled trial to evaluate the effects of combined chemotherapy and radiotherapy versus radiotherapy alone in patients with cancer of the esophagus [28].
A sample size of 150 patients was planned to detect an improvement in the two-year survival rate from 10%-30% in favor of combined R x (at α = 0.05 and β = 0.10). At the interim analysis, 88% of patients in the control group (RT) had died while only 59% in the experimental arm (combined R x ) had died, resulting in a survival advantage of 29% in favor of combined R x (p < 0.001).
For this reason, the trial was terminated prematurely after enrolling 121 patients. Two percent of patients died as a result of treatment in the combined R x group versus 0% in the RT arm. Thus, the observed net benefi t/harm ratio in this trial was [88-59-2]/2 = 13.5 [29] (the best-case scenario).
For our worst-case scenario we assume that two-thirds of patients who experienced life-threatening toxicities with combined R x (12%) will have died. This will result in the worst-case net benefi t/harms ratio = (88-59-12)/12 = 1.4.
The trial was stopped using classic inferential statistics which indicated that the probability of the observed results, assuming the null hypothesis that combined R x is equivalent to RT, was extremely small (p < 0.001). This, however, tells us nothing about how true the alternative hypothesis is [16,17], i.e., in our case, what is the probability that combined R x is better than RT? The probability that the research fi nding is true [16,17] (i.e., that combined R x is truly better treatment than RT) under the best-case scenario is 95% [95% CI, 89%-99.9%]. Under the worst-case scenario, the probability that combined R x is better than RT is 80% [95% CI, 61%-99%].

Box 1. Is Combined Chemotherapy Plus Radiotherapy Superior
To Radiotherapy Alone for Treating Esophageal Cancer?
A practical interpretation of this inequality is that some research fi ndings may never become acceptable unless we are ready to violate the axioms of EUT, i.e., accept value r to be larger than defi ned in Equation 4 ( Table 2).
We return now to the "real life" scenario above, i.e., the dilemma of whether to stop a clinical trial early. In our worst-case analysis (Box 1), we found that the probability that combined R x is better than radiotherapy alone could potentially be as low as 80% [95% CI, 61%-99%]. This fi gure overlaps with the probability of the threshold of 41% [95% CI, 11%-72%] above which research fi ndings are acceptable under the worst case scenario (see Table 1) (i.e., PPV is possibly less than p t ; see Equations 1 and 2). Thus, it is quite conceivable that the investigators made a mistake when they closed the trial prematurely.
One way to handle situations in which evidence is not solidly established is to explicitly take into account the possibility that one can make a mistake and wrongly accept the results of a research hypothesis. Accepting this possibility can, in turn, help us determine "decision thresholds" that will take into account the amount of error which may or may not be particularly troublesome to us if we wrongly accept research fi ndings.
Let us assume that the investigators in the esophageal cancer trial are prepared to accept that they may be wrong and that they were willing to forgo 10%, 30%, or 67% of benefi ts. Using Equation 3, the calculations in Box 2 and Figure 2 show that for any willingness to tolerate loss of net benefi ts of greater than 10%, the probability that combined R x is superior to R T is above all decision thresholds (since p r = 0 in best-case scenario; Equation 3). Therefore the investigators seemed to have been correct when they terminated the trial earlier than originally anticipated. Table 2 summarizes the results of most types of clinical research showing the probabilities that the research fi ndings are true and the benefi t/harms ratio above which the fi ndings become acceptable. For each type of research, the table shows these probabilities with and without acceptable regret being taken into account. What is remarkable is that depending on the amount of acceptable regret, our acceptance of potentially false research fi ndings may dramatically change. For example, in the case of a meta-analysis of small inconclusive studies, we can accept the research hypothesis as true only if B/H > 1.44. However, if we are willing to forgo, say, only 1% of the net benefi ts in case we prove to be mistaken, the B/ H ratio for accepting the fi ndings from the meta-analysis of small inconclusive studies dramatically increases to 59.

Conclusion
In the fi nal analysis, the answer to the question posed in the title of this paper, "When should potentially false research fi ndings be considered acceptable?" has much to do with our beliefs about what constitutes knowledge itself [24]. The answer depends on the question of how much we are willing to tolerate the research results being wrong. Equation 3 shows an important result: if we are not willing to accept any possibility that our decision to accept a research fi nding could be wrong (r = 0), that would mean that we can operate only at absolute certainty in the "truth" of a research hypothesis (i.e., PPV = 100%). This is clearly not an attainable goal [1]. Therefore, our acceptability of "truth" depends on how much we care about being wrong. In our attempts to balance these tradeoffs, the value that we place on benefi ts, harms, and degrees of errors that we can tolerate becomes crucial.  The calculated (acceptable regret) threshold above which we should accept research fi ndings is shown for the worst-case scenario (B/H = 1.4; see text for details) with a (hypothetical) assumption that we are willing to forgo 30% of the benefi ts (slanted line). The calculated threshold probability (acceptable regret threshold) has a value of 58% when B/H = 1.4 (the horizontal line). This means that as long as the probability that research fi ndings are true is above this acceptable regret threshold, these research fi ndings could be accepted with tolerable amount of regret in case the research hypothesis proves to be wrong (for didactic purposes only one acceptable regret threshold is shown). See Box 2 and text for details.
However, because a typical clinical research hypothesis is formulated to test for benefi ts, we have here postulated a relationship between acceptable regret and the fraction of benefi ts that we are willing to forgo in the case of false research fi ndings. Unfortunately, when we move outside the realm of medical treatments and interventions, the immediate and long-term harms and benefi ts are very diffi cult to quantify. On occasion, wrongly adopting some false positive fi ndings may lead to the adoption of other false fi ndings, thus creating fi elds replete with spurious claims. One typical example is the use of stem cell transplant for breast cancer, which resulted in tens of thousands of women getting aggressive, toxic, and very expensive treatment based on strong beliefs obtained in early phase I/II trials until controlled, randomized trials demonstrated no benefi ts but increased harms of stem cell transplants compared with conventional chemotherapy [25]. Therefore, even for clinical medicine, where benefi ts and harms are more typically measured, we should acknowledge that often the quality of the information on harms is suboptimal [26]. There is no guarantee that the "benefi ts" will exceed the "harms." Although (as noted in Text S1) there is nothing to prevent us from relating R 0 to harms, or both benefi ts and harms, one must acknowledge that there is much more uncertainty, often total ignorance, about harms (since data on harms is often limited). As a consequence, under these circumstances research may become acceptable only if we relax our criteria for acceptable regret, i.e., accept value r to be larger than defi ned in Equation 4. In other words, unless we are ready to violate the precepts of rational decision making (see the fi gures in red in Table  2), a research fi nding with low PPV (the majority of research fi ndings) should not be accepted [1].
We conclude that since obtaining the absolute "truth" in research is impossible, society has to decide when less-than-perfect results may become acceptable. The approach presented here, advocating that the research hypothesis should be accepted when it is coherent with beliefs "upon which a man is prepared to act" [27], may facilitate decision making in scientifi c research.

Supporting Information
Text S1. Longer version of the paper. You will recall (in Box 1) that the Radiation Oncology Cooperative Group investigators hoped to detect an absolute difference of 10%-30% in survival in favor of combined R x . By fi nding that combined R x improved survival by 29%, they appeared to have realized their most optimistic expectations [28]. This implies that the investigators would consider their trial a success even if the survival was improved by 10% instead, i.e., less than 67% of the realized, but most optimistic outcome [1-(.10/.30) × 100% = 67%].
Therefore, we assume that the investigators in the esophageal cancer trial are prepared to accept that they may be wrong and that they were willing to forgo 10%, 30%, or 67% of benefi ts.
We applied Equation 3 to calculate acceptable regret thresholds above which we can accept research fi ndings as true (i.e., when PPV > p r ).
Best-case scenario (benefi t/harm ratio: 13.5). The calculated thresholds above which we should accept the fi ndings are zero, regardless of whether our tolerable loss of benefi ts was 10%, 30%, or 67%. Note that these thresholds (p r = 0) are well below calculated probability that the research hypothesis is true [PPV = 95% (88%-99.9%)] ( i.e., PPV > p r = 0 for all acceptable regret assumptions; Equation 3, Table 1) and hence the research hypothesis should be accepted.

Box 2. Determining the Threshold Above Which Research Findings
Are Acceptable When Acceptable Regret Is Taken Into Account