Simple changes of individual studies can improve the reproducibility of the biomedical scientific process as a whole

We developed a new probabilistic model to assess the impact of recommendations rectifying the reproducibility crisis (by publishing both positive and ‘negative‘ results and increasing statistical power) on competing objectives, such as discovering causal relationships, avoiding publishing false positive results, and reducing resource consumption. In contrast to recent publications our model quantifies the impact of each single suggestion not only for an individual study but especially their relation and consequences for the overall scientific process. We can prove that higher-powered experiments can save resources in the overall research process without generating excess false positives. The better the quality of the pre-study information and its exploitation, the more likely this beneficial effect is to occur. Additionally, we quantify the adverse effects of both neglecting good practices in the design and conduct of hypotheses-based research, and the omission of the publication of ‘negative‘ findings. Our contribution is a plea for adherence to or reinforcement of the good scientific practice and publication of ‘negative‘ findings.


Content
In this supplement, the derivations of formulas for the expectations of the scientific gain, the number of false positives, and the total number of samples needed in the entire research process are presented. For comprehensive explanation, the Monte Carlo simulation of the scenario presented in Fig 2 (main text) that was used to test the validity of the formulas for the expectations is described first. Subsequently, formulas for the above mentioned expectations are presented for the case that only one factor is tested (n F =1). Then we advance to more comprehensive equations for any numbers of factors taken into consideration for a research problem. From these equations, general properties can be derived. Especially, it can be proven that the higher π k , the higher the statistical power should be. The corresponding proof is located at the end of this supplement. In addition, formulas for more than one causal factor are presented. For better orientation, please be referred to the list of all relevant variables presented below.

List of model variables E{. }
(statistical) expected value of (.) F i potential factors to be tested n a sample size in a single experiment n F number of factors (hypotheses) considered for a research problem n cF number of (unknown) causal factors (true hypotheses) n t number of research teams n K number of needed studies to finish research n total total number of samples n p number factors (hypotheses) tested in parallel π k conditional pre-study probability (of F i ) P pub probability to publish negative results u bias α significance level β false negative rate (1 − β = power) β min minimal value of β regarding E{n total } δ standardized effect size ω i objective pre-study probability

Description of the Monte Carlo simulation
In our research scenario (main text, Fig 2) we assume that n F factors are under consideration (F 1 , … , F i , … , F n F ). One or more of these factors are causal for the research problem under investigation. These causal factor(s) are placed on rank(s) between 1 and n F using information of hypothetical exploratory pilot studies. If F 1 is causal, then F 1 = 1, and if F 1 is non-causal, then F 1 = 0 and so on. The probability of F 1 = 1 is given by ω 1 . In the Monte Carlo (MC) simulation, this ranking is randomised in accordance with ω i in each iteration. The resulting rankings are stored in a matrix, which dimensions are determined by the number of factors under consideration (n F : columns) and the number of simulation runs (N: rows). If there is only one causal factor, the sum of the matrix elements of column i divided by N gives approximately ω i .
The simulation procedure starts with the first team (team 1) testing F 1 : If F 1 = 1, F 1 is tested positive with the probability 1 − β. In this case and if the number of causal factors n CF = 1, the number of true positives is increased by 1 and the research process stops. If F 1 is tested negatively (false) this result will be published with the probability P pub . If published, all other teams know this negative result and F 2 is tested next. If the result is not published, team 2 will test F 1 . The testing of F 1 is repeated until A) one team tests F 1 positively (we assume that all positive results are published), or B) one team publishes a negative result, or C) all teams have tested F 1 negatively. Every test conducted increases the number of total samples by the sample size needed for this single experiment.
If F 1 = 0, the factor is tested positively with the probability α. The number of false positives is then increased by 1. After falsification of the result (an intrinsic characteristic of our simplified scenario) all teams know that the positive result is false and F 2 is tested next. If F 1 is tested negatively, it will be published with the probability P pub . If published, all other teams know the negative result and F 2 is tested next. If the result is not published, team 2 will test F 1 . Again, testing of F 1 is repeated until A) one team tests F 1 false positively (we assume that all positive results are published), or B) one team publishes a true negative result, or C) all teams have tested F 1 negatively. Every test conducted increases the number of total samples by the sample size needed for this single experiment.
The process continues until either A) all factors are tested and all teams know the results (either by reading the publication or by testing themselves) or B) all truly causal factors (n CF ) were tested positively. During the simulation, all true and false positive results and all samples needed in the process are counted. We consider independence among the results obtained from different teams, i.e. false positive and false negative study outcomes are not correlated given the true status (causal/ not causal) of the factor.
It is, however, also possible to derive formulas for the three outputs of the Monte Carlo Simulation, i.e. expectations of the scientific gain, number of false positives, and total number samples. For this, we need to first introduce equations for the pre-study probabilities. These equations are governed by the parameter π k , which represents the pre-study information.
Additionally, we introduced a different scenario. Here, more than one factor is tested in parallel by different research teams. The number of factors tested in parallel is denoted by n p and usually n p < n F . If all n p test results are published and the truly causal factor is not detected the next n p factors will be tested and so on until factors are tested of the truly causal factor is detected. If in contrast a negative test result is not published, one of the teams having all not tested the corresponding factor will repeat this test. As in scenario described above this testing is iterated until one test result is published or all team have tested the factor.
A further extension of the model allows for errors in the validation process. An error of the first kind in the validation means that a false positive result is canonized, i.e. the process stops with a false result. An error of the second kind in the validation means that the true factor cannot be found.

Derivation of equations  Pre-study probabilities
We picture that n t research teams consider n F factors as potential solutions to a research problem, i.e. as causal for an effect under investigation. For simplification, we initially assume that there is only one causal factor (n cF = 1) that explains the observed effect. An exploratory study is performed once which, together with an evaluation procedure, ranks the n F factors according to their likelihood to be causal. Since the outcomes of the exploratory studies can be regarded as random variables, there is an objective pre-study probability that the causal factor is placed on rank i (ω i ) in this simulation study.
For interpretation of the model outcomes, three properties of ω i are advantageous: (i) the probabilities should be decreasing (ω 1 ≥ ω 2 ≥ ⋯ ≥ ω n F ) i.e. the ranking is optimal, (ii) the conditional probability (π k ) for F i = 1 given F 1 = 0, … , F i−1 = 0 should be constant for all i, and (iii) the asymptotic sum over all ω i should be ∑ ω i It can easily be derived that: (1) fulfils also (ii) as the conditional probability is given by .

 Expectation the total number of samples (E{
The only factor considered in this case (F 1 ) can be either causal or not. The expected number of studies performed on this factor is the weighted sum of the expected numbers for both cases. Let us first assume that F 1 is causal. The factor will be tested exactly i times, if the following conditions are fulfilled: • F 1 was tested negatively i − 1 times • none of the corresponding studies were published • F 1 is tested positively with the i-th trial or tested negatively with that trial and the study is published.
The maximum number of studies is given by the number of research teams in the field (n t ). If F 1 is already tested by n t − 1 teams and none of the test was published, then F 1 will be tested n t times regardless of the outcome of the experiment.
The expectation of the number of studies for F i = 1 is given by the following series: (2) n e,1 = {(1 − P pub )β} n t −1 n t + ∑ i This equation can be simplified using the series ∑ i We thus obtain: (3) n e,1 = Analogously we derive the expected number of studies n f,1 for the case that F 1 = 0. Here, β is replaced by 1 − α, which is the probability to obtain a true negative result. This transformation yields: (4) n f,1 = .
The expected number of studies is then: (5) E{n K }(α, β, n F , π k , P pub ) = ω 1 n e,1 + (1 − ω 1 )n f,1 The expectation of the total number of samples used in the research process is given by the product of the number of samples per study n a (α, β, δ) and the expectation of the number of studies.

 Expectation of the number of false positives ( { }; = )
A false positive publication is of course only possible, if F 1 = 0, which is the case with the probability 1 − ω 1 .
The probability that team i is testing F 1 positively is given by the probability α multiplied by the probabilities that the preceding i − 1 teams tested F 1 negatively ((1 − α) i−1 ) and did not publish these results The probability that F 1 is tested positively by any one team is obtained by summing these probabilities over all n t teams (in our model, it is not possible that F 1 is tested positive by more than one team).
Using the geometric series we thus obtain: Here, is the expected number of false Similarly we derive the equation for the expected scientific gain, i.e. the number of true positives. Here (1 − α) is replaced by β and 1 − ω 1 by ω 1 . Therefore Here, is the expected number of true positives given

Introduction of bias into the equations
To introduce bias we rely on the approach of [1] as the "proportion of probed analyses that would not have been "research findings," but nevertheless end up presented and reported as such, because of bias" (denoted by u) 1 . In equations (3), (4), (7)and (8) β is therefore replaced by β(1 − u) and 1 − α by (1 − α)(1 − u). Thereby we obtain the following equations: .
In order to determine the impact of β and α on scientific gain and false positive rate, respectively we differentiate (11) and (12).
The derivation of (11) with respect to α yields: The derivation of (12) with respect to β yields (14) These equations prove that fp 1 increases with α while g 1 decreases with β.

Equations for arbitrary number of factors ( ≥ )
The formula for the total number of samples for any number of considered factors n F is derived by the following consideration: If F i = 1, which is the case with the probability ω i , the mean number of tests of this factor is given by n e,1 . In this case, the preceding i − 1 non-causal factors were each tested n f,1 -times in mean. In the case that F i was not recognized as the causal factor, which happens with the probability 1 − g 1 , additional n F − i non-causal factors will be tested n f,1 -times in mean. In the case that none of the considered factors is causal n F × n f,1 studies will be performed in mean. Thus )n F n f,1 } = n a (α, β, δ) × E{n K }.
In order to use the parametrization of the pre-study probabilities, we introduce some abbreviations: f 1 (π k , n F ) = ∑ ω i n F i = 1 − (1 − π k ) n F , and With this notation the expected number of studies in equation (15) can be written as: (16) E{n K } = f 1 (n e,1 − n f,1 ) + g 1 (f 2 − n F f 1 )n f,1 + n F n f,1 .
The equation for the expected number of false positives is derived similarly and we obtain: Equation (17) shows that the number of false positives increases with β, because g 1 decreases with β and g 1 is the only term in (17) which depends on β. This also means that the absolute number of false positive results decreases with increasing sample size n a and vice versa.
The expectation of the scientific gain is given by the product of the probability that the causal factor is among the n F factors considered (∑ ω i n F i ) and the probability that a causal factor will be tested positively by one of the n t teams (g 1 ): (18) E{g} = ∑ ω i n F i=1 g 1 = f 1 g 1

Parallel testing I
So far, we assumed that the research teams performed the studies consecutively. Let us now consider a scenario in which each factor is tested by all teams in parallel, but in which results are published consecutively. Then n e,1 = n t and n f,1 = n t , while g 1 and fp 1 remain unchanged. Therefore, E{fp} and E{g} are unchanged, too, and the following equation is obtained for the expected number of studies: Since in this scenario n e,1 and n f,1 do no longer depend on P pub , publishing negative result would not increase the efficiency of research, i.e. decrease E{n total }.

Parallel testing II (factors are tested in parallel, = )
Another possibility is that n p factors are tested in parallel. Then the expectation for the number of studies is:

Equations for arbitrary number of causal factors ( ≥ )
The model can be extended to the case of more than one causal factor. In this case the research problem is solved, if all n cF causal factors are detected.
In order to derive an equation for E{n K } for arbitrary n cF , we need an expression for the probability that F i = 1, if there were already n cF − 1 causal factors among the i − 1 factors so far considered. Let us denote this probability by ω i end,n cF .
Furthermore, a formula is needed for the probability P(j; n F , n cF ), that exactly j out of n cF causal factors were among the n F factors.
To obtain a formula for ω i end,n cF , let us consider an ordering of n F factors such that the last of n cF − 1 causal factors is placed on position j, i.e. there are n cF − 2 causal factors on positions preceding j. The probability for such an ordering is ω i end,n cF −1 . The probability that an additional causal factor is placed on position i > j is given by π k (1 − π k ) i−(n cF −1)−1 . We therefore obtain the following recursion formula: (22) ω i end,n cF = ω i end,n cF −1 (1 − (1 − π k ) i−(n cF −1) ) + ∑ ω j end,n cF −1 i−1 j=n cF −1 π k (1 − π k ) i−n cF , n cF > 1, i = n cF , . . , n F The first summand of (22) takes the case into account that the last of n cF − 1 factors are placed on position i. The term 1 − (1 − π k ) i−(n cF −1) describes then the probability that the additional factor is placed on a position from 1 to i − 1. The recursion base of 0 is given by ω i end,1 = ω i = π k (1 − π k ) i−1 .
To obtain a formula for P(j; n F , n cF ), we consider that there are two ways to choose j out of n cF causal factors among the n F factors.
Either there were j out of n cF − 1 causal factors chosen among n F factors and no additional causal factor will be chosen among the remaining n F − j factors or j − 1 out n cF − 1 causal factors will be selected and one additional causal factor will be chosen among the n F − (j − 1) remaining factors. Therefore: (23) P(j; n F , n cF ) = P(j; n F , n cF − 1)(1 − π k ) n F −j + P(j − 1; n F , n cF − 1)(1 − (1 − π k ) n F −(j−1) ), for j = 1, … , (n cF − 1).This recursion formula starts with P(j = 0; n F , n cF ) = ((1 − π k ) n F ) n cF and P(j = 0; n F , n cF = 0) = 1, with P(j; n F , n cF ) = 0 for j > n cF and j < 0.
The equation for E{n K } is then derived from the following consideration: If F i is the last causal factor, exactly n cF n t,1 + (i − n cF )n f,1 studies will be performed in mean before the problem is solved, i.e. all n cF factors are discovered. The research problem is not solved with the probability 1 − g 1 n cF . Then additional (n F − i)n f,1 studies will be performed in mean. If there were only j < n cF causal factors among the n F considered factors, the expectation of the number studies is equal to (n − j)n f,1 + jn t,1 . Therefore we obtain:

Proof that the probability of a beta error ( ) minimizing { } decreases with and n F
We prove that the higher the pre-study probability (π k ), the higher the recommended statistical power 1 − β and therefore the sample size per study should be. The proof is given here only for n cF = 1. However, as we will see, the proposition depends only on three widely applicable conditions. We further prove that the higher the number of considered factors n F , the higher the recommended statistical power 1 − β and therefore the sample size per study should be. This proof could only is only given for P pub = 1, n cF = 1, n p =1.
Definition of β min : The smallest probability of an error of the second kind β m , which fulfills E{n total }(β m ; π k , n F , α, δ, P pub ) ≤ E{n total }(β; π k , n F , α, δ, P pub ) for all β ∈ [0,1 − α] for fixed π k , n F , α, δ, P pub is termed β min . If n F , α, δ, P pub are kept constant, β min is a function of π k denoted by β min (π k ). Similarly the expectation of the number of studies is denoted by E{n k }(β, π k ).
Proposition: a) From π k 1 < π k 2 follows β min (π k 2 ) ≤ β min (π k 1 ). b) From n F1 < n F2 follows β min (n F2 ) ≤ β min (n F1 ) . S1 Fig 1: Schematic curves of expectation of total the number of samples per research problem versus ß for two different pre-study probabilities 1 and 2 ( 1 < 2 ). The slope of the curves increase with . Therefore the second curve { (ß; 2 )} is rising at least as long as the first curve { } (ß; 1 ) is rising. Then the new minimum can only be found left of the old or at the same position, which is equivalent to the proposition.

Proof:
For two different pre-study probabilities π k 1 and π k 2 there are two different functions of expectation of the total number of samples used for a research problem( illustrated in S1 Fig 1). The proposition is proven, if (27) En total (β 2 , π k 2 ) − En total (β min (π k 1 ), π k 2 ) ≥ 0 holds for all β 2 > β min (π k 1 ).
However, if this is true, then β test is not the smallest β minimizing En total , i.e. β min (π k 2 ).
For illustration see S1  To prove (27), we first demonstrate that three relations are sufficient to prove the proposition. We then show that these conditions apply to our model.
The relations are the following: (i) n a (β; α, δ) ≤ n a (β min ; α, δ)(β > β min ), Relation (i) confirms that for higher power a bigger sample size is needed if everything else is kept constant. Additionally, it is assumed that n a is constant throughout the research process. It implies additionally that prior probabilities are not directly used in the sample size determination. Relation (ii) means that we need less studies if our pre-study information is increased. Relation (iii) means that the impact of the statistical power on the number studies is higher, if the causal factors are tested earlier in the research process, which is more frequently the case the greater the pre-study information.
This inequality equivalent to (27) and proves thereby the proposition.
The four intersection points in S1 Fig 1 illustrate inequality (29). The left side of the inequality corresponds to the vertical difference of intersection points 4 and 3, the second difference corresponds to that of intersection point 2 and 1. It means that the increase in the total number of samples is even greater from β min (π k 1 ) to β 2 , if π k is increased.
We now show that the relations (i), (ii) and (iii) apply to our model: Relation (i) can be assumed to be the case for all statistical tests. Relation (ii) follows from the differentiation with respect to π k of (16). In equation (16), two terms depend on π k :

Properties (ii) and (iii) for parallel testing
It can easily be seen that conditions (i-iii) apply also if parallel testing (I) of one factor by all teams is assumed. The proof can be derived if (19) is differentiated with respect to π k and β.
A bit more complicated is the case of n p factors tested in parallel (II). Here the proof is given for P pub =1, only. Then the derivation of (20) yields: (1 − π k ) n p j−1 + n p floor(n F /n p )(1 − π k ) n p floor(n F /n p )−1 (n F − n p floor(n F /n p ))} Since all summands in the curly bracket are positive the derivation is negative and condition (ii) holds. Condition (iii) follows then directly from ∂g 1 ∂β ≤ 0.

Proof of proposition b)
We start again with a sufficient condition for proposition b). Such a condition is given by (32) En total (β 2 , n F + 1) − En total (β min (n F ), n F + 1) ≥ 0.
We now need to demonstrate that (33) applies to the model. Let us assume that P pub = 1. Then ∂E{n k }(n F ) ∂β = ∑ ω i n F i (n F − i) holds.