On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials

doi:10.1371/journal.pone.0325920

Fig 1.

Relative frequency of type I (dark gray) and type II (light gray) errors in 10 000 replicas for each initial condition of the simulation.

The values are for different sample sizes for study arm, ratios of the standard deviations, and the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold for 3 variance ratios; s₁ and s₂ are the standard deviations in the placebo and treatment groups, respectively.

More »

Expand

Fig 2.

The computed weighted sum of type I (dark gray) and type II (light gray) errors (

Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, ratios of the standard deviations, and the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold for 3 variance ratios; s₁ and s₂ are the standard deviations in the placebo and treatment groups, respectively.

More »

Expand

Fig 3.

The computed weighted sum of type I (dark gray) and type II (light gray) errors (

Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, a standard deviation ratio (s₂/s₁) of 0.5, a minimum effect size of interest of 0.5, and different prior probability (pr) values that H₁ is true. In this simulation, the seriousness of type I and type II errors was taken equal. The line color indicates the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold.

More »

Expand

Fig 4.

The computed weighted sum of type I (dark gray) and type II (light gray) errors (

Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, a standard deviation ratio (s₂/s₁) of 1.5, a minimum effect size of interest of 0.5, different prior probability (pr) values that H₁ is true, and different values for the seriousness of type II relative to type I error (C). The line color indicates the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold.

More »

Expand

Fig 5.

Variation of the most appropriate p value significance threshold with different ratios of the standard deviations and sample sizes.

The bottom and top borders of each box indicate the 25^th and 75^th percentiles, respectively; the horizontal color line in the middle of each box shows the median. The lower whisker represents the smallest value within 1.5 times the interquartile range (IQR) less than the 25^th percentile; the upper whisker indicates the largest data value within 1.5 times the IQR greater than the 75^th percentile. Open circles represent outliers. The gray horizontal line indicates the p value significance threshold for each sample size assuming an equal variance. Note that the ordinate has a logarithmic scale; higher values correspond to lower p significance threshold. s₁ and s₂ are the standard deviations in the placebo and treatment groups, respectively.

More »

Expand

Table 1.

Pseudocode of the simulation program (see Supplementary Materials for R codes S1 File).

More »

Expand