Fig 1.
Relative frequency of type I (dark gray) and type II (light gray) errors in 10 000 replicas for each initial condition of the simulation.
The values are for different sample sizes for study arm, ratios of the standard deviations, and the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold for 3 variance ratios; s1 and s2 are the standard deviations in the placebo and treatment groups, respectively.
Fig 2.
The computed weighted sum of type I (dark gray) and type II (light gray) errors (
Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, ratios of the standard deviations, and the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold for 3 variance ratios; s1 and s2 are the standard deviations in the placebo and treatment groups, respectively.
Fig 3.
The computed weighted sum of type I (dark gray) and type II (light gray) errors (
Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, a standard deviation ratio (s2/s1) of 0.5, a minimum effect size of interest of 0.5, and different prior probability (pr) values that H1 is true. In this simulation, the seriousness of type I and type II errors was taken equal. The line color indicates the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold.
Fig 4.
The computed weighted sum of type I (dark gray) and type II (light gray) errors (
Eq. 4) in 10 000 replicas for each initial condition of the simulation. The values are for different sample sizes for study arm, a standard deviation ratio (s2/s1) of 1.5, a minimum effect size of interest of 0.5, different prior probability (pr) values that H1 is true, and different values for the seriousness of type II relative to type I error (C). The line color indicates the criteria used for determining the p value significance threshold—orange, p < 0.05; red, p < 0.005; blue, the flexible p significance threshold assuming equal variances; and green, the flexible p significance threshold.
Fig 5.
Variation of the most appropriate p value significance threshold with different ratios of the standard deviations and sample sizes.
The bottom and top borders of each box indicate the 25th and 75th percentiles, respectively; the horizontal color line in the middle of each box shows the median. The lower whisker represents the smallest value within 1.5 times the interquartile range (IQR) less than the 25th percentile; the upper whisker indicates the largest data value within 1.5 times the IQR greater than the 75th percentile. Open circles represent outliers. The gray horizontal line indicates the p value significance threshold for each sample size assuming an equal variance. Note that the ordinate has a logarithmic scale; higher values correspond to lower p significance threshold. s1 and s2 are the standard deviations in the placebo and treatment groups, respectively.
Table 1.
Pseudocode of the simulation program (see Supplementary Materials for R codes S1 File).