The generalized inference on the ratio of mean differences for fraction retention noninferiority hypothesis

The fraction retention non-inferiority hypothesis is often measured for the ratio of the effects of a new treatment to those of the control in medical research. However, the fraction retention non-inferiority test that the new treatment maintains the efficacy of control can be affected by the nuisance parameters. Herein, a heuristic procedure for testing the fraction retention non-inferiority hypothesis is proposed based on the generalized p-value (GPV) under normality assumption and heteroskedasticity. Through the simulation study, it is demonstrated that, the performance of the GPV-based method not only adequately controls the type I error rate at the nominal level but also is uniformly more powerful than the ratio test, Rothmann’s and Wang’s tests, the comparable extant methods. Finally, we illustrate the proposed method by employing a real example.


Introduction
The purpose of drug development in clinical trials is often to prove that the experimental treatment (new therapy) is superior to active control (standard therapy) or placebo. From an ethical point of view, for mortality or severe morbidity trials, as long as the effective treatment exist for conditions that can lead to death or severe irreversible morbidity, the placebo or untreated controls cannot be used. If ethically justifiable, it may be advisable to include a placebo group for internal validation. [1,2]Traditionally, in clinical trials, the clinical medicine researchers hope to show that the experimental treatment is superior to the active control. However, when experimental treatments offer other advantages over controls (for example, better safety or ease of administration), then non-inferiority trial (NI trial) can be used for validation. [3] The goal of the NI trial is to conclude that the experimental treatment is more effective than a placebo and is not unacceptably less effective than the active control. The fixed margin method and the synthesis method are widely used to test non-inferiority hypothesis procedures in NI trials. The Food and Drug Administration (FDA) regulatory guidances have published the fixed margin and synthesis methods that are used in the design of NI trials. [4] The fixed margin method is to first define non-inferior margin, and then demonstrate that the experimental PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.0234432 June 9, 2020 1 / 12 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 treatment is not worse than the control effect. [5] The purpose of the fraction retention hypothesis is to test what percentage of the active control effects the new treatment can retain. The synthesis method based on retention is used to test whether a new treatment can retain a fraction of the active control effects. Some researchers have proposed methods for statistical verification of the retention NI hypothesis. For example, Rothmann et al. [6] proposed one test method, hereafter referred to as Rothmann's test. Rothmann's test assumes the standard drug's efficacy to be positive, but in practice, this assumption is not necessarily reasonable. Moreover, the power of Rothmann's test does not perform well on small sample size. Wang et al. [7] proposed another test that was developed from asymptotic normality theory, and hereafter, we refer to this test as Wang's test. For the small sample size, Wang's test is better than Rothmann's test with respect to power. Additionally, in clinical trials, the sample size required for Wang's test has been determined to be smaller than the size required for Rothmann's test. Nevertheless, the proposed test statistic is assumed to have homogeneous variance under the alternative hypothesis as well as null hypothesis in Wang's test. In practice, assuming homogeneity in NI trials is inappropriate. In recent years, Deng and Chen [8] derived a ratio test from the Cauchy-like distribution proposed by Masaglia [9]. In the ratio test, the standard drug efficacy is not assumed to be positive, and the test statistic is not assumed to have homogeneous variance under the alternative hypothesis as well as null hypothesis. In addition, the ratio test is more powerful than the two aforementioned tests. Nevertheless, in the ratio test, the critical value of the Cauchy-like distribution needs to be calculated by numerical integration, which makes the calculation process more complicated. Furthermore, the retention NI test procedure can be affected by the nuisance parameters. Because of the complexity of the sampling distribution of test statistics, it is difficult to assess the fraction retention non-inferiority hypothesis of ratio of mean differences. In the present study, a heuristic statistical testing procedure on the ratio of mean differences for the retention NI hypothesis is applied on the basis of the concept of GTVs. The heuristic statistical testing procedure is called the generalized p value based (GPV-based) method that is more convenient to calculate the type I error rate and empirical power without complicated computation than ratio test.
The p-value of a statistical test is calculated by using sample observations as the critical value for the test. If the nuisance parameters exist in the test procedure, the p-value may be dependent on the nuisance parameters and cannot be easily calculated. In order to overcome this problem, Tsui and Weerahandi [10] generalized the rejection region of the test so that the calculation of p-value can be independent of the nuisance parameters, and provided a GPV test procedure. Tsui and Weerahandi [10] gave the explicit definition of generalized test variable (GTV) and GPV, and showed that it is an exact probability of an extreme region. For the definition of GTV and GPV, please refer to the S1 Appendix.
Tsui and Weerahandi [10] successfully used the GPVs to provide small sample solution for hypothesis testing problems when nuisance parameters present and testing procedures are difficult to obtain. Subsequently, Weerahandi [11] provided the generalized pivotal quantities (GPQs) to construct the generalized confidence intervals (GCIs) of specific parameters which are contained the nuisance parameters. The GPVs has been successfully applied to various hypothesis testing topics, such as in research comparing accuracy by examining receiver operating characteristic curves in gold standard situations [12,13], research on the tolerance interval for random effects models [14,15], research on evaluation of dissolution profile similarity [16], research on GCIs in a linear measurement error model [17], research comparing accuracy by examining receiver operating characteristic curves when a gold standard situation does not exist [18], and research applying a delta-lognormal distribution to trawl survey data [19], and applying the generalized inference on the sign testing problem about the normal variance [20], etc.
In this study, the various methods on the ratio of mean differences for the fraction retention NI hypothesis are reviewed. We propose the GPV-based method which is a heuristic statistical testing procedure. Moreover, the type I error rate and empirical power of the GPV-based method are examined under simulation studies. We compare the GPV-based method's performance with those from the Rothmann's, Wang's and the ratio tests. The methods are illustrated using published data. Conclusions are presented in final section. S1 and S2 Appendices detail the underlying definitions of generalized inference, and the reader is referred to [10] and [11].

Ratio test, Rothmann's test and Wang's test
In this study, we use the same notation and definition as in Deng et al. [8] to compare the proposed method. In a three-arm NI trial, T NI , C NI and P NI represent the effects of experimental treatment, control, and placebo, respectively. The C H and P H respectively denote the effects of the control and placebo obtained from the historical trials. LetT NI ,Ĉ NI ,P NI ,Ĉ H andP H represent the estimates of effects for T NI , C NI , P NI , C H and P H , respectively. Define θ 1 = T NI − P NI and θ 2 = C H − P H to denote the effect of the new treatment and control, respectively. Hence, the estimates of θ 1 and θ 2 can be written asŷ 1 ¼T NI ÀP NI andŷ 2 ¼Ĉ H ÀP H , respectively. In this research, we would like to analyze the fraction retention NI hypothesis testing problem in the following form, where the null hypothesis is represented by H 0 , the alternative hypothesis is represented by H 1 , δ = θ 1 /θ 2 , and δ 0 (0 < δ 0 < 1) denotes the given level of fraction retention. The hypothesis presented in (1) can be rewritten as Next, we introduce the ratio, Rothmann's and Wang's tests.
Regarding the hypothesis presented in (1), Deng and Chen [8] proposed the ratio test on the basis of the Cauchy-like distribution proposed by Masaglia [9]. The ratio test is a non-standardized test, and for hypothesis (2), test statistic of the ratio test has the form Z ¼ŷ 1 y 2 Under the constancy assumption, namely C NI − P NI = C H − P H , test statistic of the ratio test Z can be rewritten asẐ Furthermore, Deng and Chen [8] assumed thatĈ NI ÀT NI andĈ H ÀP H are normally distributed with means μ NI and μ H , and variances s 2 NI and s 2 H . That iŝ where μ NI and μ H indicate the mean effect of the new treatment relative to the active control in the NI trial and the mean effect of the active control relative to the placebo in the historical trial, respectively; s 2 NI and s 2 H indicate the variance of effect of the new treatment relative to the active control in the NI trial and the variance of effect of the active control relative to the placebo in the historical trial, respectively. And therefore, Let θ 1 = θ 2 − μ NI and θ 2 = μ H , the distribution of the estimated effect for new treatment relative to placebo can be indicated aŝ Because of the complexity of the distribution ofẐ for the aforementioned NI hypothesis parameter settings, Deng and Chen [8] used the linear transformation method proposed by Marsaglia [9]. Accordingly, the distribution ofẐ is equivalence that of as follows: where a = −μ NI /σ NI , b = θ 2 /σ H , and r = σ NI /σ H , X and Y are two independent standard normal random variables. In the NI trial, a, b, and r denote the standardized difference between the treatment and control effects, standardized efficacy of the control versus the placebo as determined in the historical data, and ratio of the C NI − T NI and C H − P H standard deviations, respectively. Consider the parameter set (b, r, δ), let g b,r, δ (t) be the probability density function and G b,r, δ (t) be the cumulative distribution function of r � (a + x)/(b + y) + 1. Accordingly, G À 1 b;r;d ðtÞ denotes the quantile function of G b,r, δ (t). Under circumstances in which b and r are known, the ratio test has the following rejection region IfẐ > G À 1 b;r;d 0 ð1 À aÞ, the null hypothesis H 0 displayed in (1) can be rejected. If b and r are unknown, the estimated values of b and r can be used to calculate the critical value of the rejection region for the ratio test.
Furthermore, for the fraction retention NI hypothesis presented in (1), Deng and Chen [8] derived the equivalence formulas of Rothmann's test rejection region. The C Rothmann 0 stest denotes the rejection region of Rothmann's test. The C Rothmann 0 stest is expressed as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where z α is the upper α critical point of the standard normal distribution. One can reject H 0 in ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In addition, for the fraction retention NI hypothesis stated in (1), Deng and Chen [8] derived the equivalence formulas of the rejection region of Wang's test, which is denoted C Wang 0 stest . The C Wang 0 stest can be expressed as � À d 0 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where z 1−α is the upper 1 − α critical point of the standard normal distribution. The null ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In practice, the true values of s 2 H , and s 2 NI are usually unknown, but corresponding estimates can be made on the basis of historical data and assumptions from current NI trials [8].

Proposed method: GPV-based method
In this section, we proposed the heuristic statistical testing procedure on the ratio of mean differences for the retention NI hypothesis, which is based on the concept of GTVs. Let X H ¼ fX 1 ; . . . ; X n H g and Y NI ¼ fY 1 ; . . . ; Y n NI g denote two independent random samples that have been drawn from normal distributions with means μ H , μ NI and variances s 2 H , and s 2 NI , respectively. Without loss of generality, these random samples can be assumed to be mutually independent. We denote �  (2), let θ 2 = μ H and θ 2 − θ 1 = μ NI , the hypothesis displayed in (2) can be rewritten as As result of the GTV can be constructed from the GPQ [11], to develop the GTV for the testing of the hypothesis presented in (8), we adopt Weerahandi's concepts [11] to construct the GPQ for μ H and μ NI . In regard to the definition of GPQ, please refer to the S2 Appendix. Hence, we define GPQs for μ H and μ NI as follows: where Z H � N(0, 1), Z NI � N(0, 1) and U H � χ 2 (n H − 1), U NI � χ 2 (n NI − 1). χ 2 (n H − 1) and χ 2 (n NI − 1) denote the chi-square distribution with degrees of freedom n H − 1 and n NI − 1, respectively. Also, Z H , Z NI , U H and U NI are mutually independent. From (10) and (12) Step 2: Calculate R μH , m and R μNI , m using (10) and (12). Step

Simulation studies
The simulation study includes three scenarios. First, the type I error rate obtained using the GPV-based method is compared with those obtained using the Ratio, Rothmann's and Wang's tests. Second, the empirical powers of the four tests are evaluated and the performances of the tests are compared. Third, the performances of the GPV-based method are compared under the different sample size.

Simulation study I: Type I error rate
We make use of the same simulation parameters as in Deng and Chen [8] for comparison of GPV-based method, ratio test, Rothmann's test and Wang's test. The retention rate in the NI hypothesis is fixed at δ 0 = 0.5 in this study. The mean effect of the active control relative to the placebo in the historical trial μ H is set to be 0.24. To evaluate the type I error rate, the standardized control effect versus placebo in historical trial b is considered for three cases: (i) b = 2, (ii) b = 3, and (iii) b = 4. The b/r is represented as the mean efficacy of the active control relative to the placebo in historical trials divided by the standard deviation of the efficacy of the new treatment relative to the active control in the NI trial. The b/r is set to be 2, 4 and 8. Additionally, for the GPV-based method, the sample size is set to be 30. Data for the simulation are independently generated 10000 times for each combination of parameters. The type I error rate is estimated for the data set generated by all methods. A total of 10000 Monte Carlo random sampling processes are performed for each data set for use in the GPV-based method. Given the 2.5% nominal significance level, the simulation study with 10000 random samples implies that the 97.5% of the type I error evaluated at either δ 0 are between 0.0219 and 0.0281. Table 1 presents the type I error rates obtained from the simulations.
The type I error rates obtained using the GPV-based method are in the range (0.0240, 0.0253); the corresponding ranges obtained using the ratio, Rothmann's and Wang's tests are (0.0232, 0.0276), (0.0225, 0.0279) and (0.0226, 0.0371), respectively. Most of the type I error rates are suitably near the nominal level of 0.025. When δ 0 < 1 and δ 0 � 1, the type I error rates obtained using the four tests are well controlled. However, for the GPV-based method, the percentage of frequencies with an type I error rate of less than 0.025 is higher than that for the other three tests. Consequently, regarding the statistical testing of the fraction retention NI hypothesis, the GPV-based method is capable of maintaining type I error rate close to the nominal level of 0.025 adequately.

Simulation study II: Empirical power
The GPV-based method's empirical power is evaluated through another simulation study. We again employ the same simulation parameters as Deng and Chen [8]. The level of fraction retention δ 0 is defined as 0.5. Fig 1 illustrates the results of the simulations.
When b and b/r are larger, the empirical power of the four methods is considerably greater. Simulation processes executed under different b and b/r values reveal that Rothmann's test has similar empirical power to Wang's test. Furthermore, the empirical power of the GPV-based method is uniformly more powerful than those of other three tests.

PLOS ONE
The generalized inference on the ratio of mean differences for fraction retention noninferiority hypothesis

Simulation study III: The performance of the GPV-based method under equal sample size
We conducted a simulation study of the performance by the GPV-based method under other sample sizes. We consider balanced sample sizes when b = 2, b/r = 2, 4, 8, δ 0 = 0.5 and δ 1 = 0.625. Table 2 details the type I error rate and empirical power simulation findings. In simulation study III, the GPV-based method is discovered to still exhibit sufficient size control at the nominal level under the different sample size. Furthermore, the empirical power increases with increasing sample size.

Numerical example
In this section, we consider the Xeloda trial [21] presented by Rothmann et al. [4], Wang et al. [7], and Deng and Chen [8]. To make comparisons, we illustrate the analysis of the NI trial with the fraction retention NI hypothesis presented in (1). Xeloda is an oral chemotherapeutic drug converted into 5-fluorouracil (5-FU). The approved first-line chemotherapy for metastatic colorectal cancer is 5-fluorouracil with leucovorin (5-FU/LV), which can be administered only through intravenous infusion. The Xeloda New Drug Application (NDA) was submitted to the Food and Drug Administration (FDA) in 2001 and contained two randomized trials. Each trial involved approximately 600 subjects and compared Xeloda (the new treatment) with 5-FU/LV (the control treatment). The NI hypothesis in these trials was that Xeloda would exert 50% or more of the effect of 5-FU/LV compared with 5-FU alone. Many studies on metastatic colorectal cancer have compared 5-FU/LV with 5-FU as the first-line treatment. In a random effects meta-analysis of 10 studies, the historical 5-FU/LV effect was estimated to be approximately 0.2341 and concomitant standard error to be approximately 0.0750 [21]. Hence, the observed standardized control effect was 3.1213. In a random-effects meta-analysis of 8 historical studies, the historical 5-FU/LV effect and corresponding standard error were estimated to be 0.2398 and 0.0593, respectively [21]. The observed standardized control effect was thus 4.0438. Table 3 presents the analytic findings for two separate Xeloda trials and the pooled research obtained for the intent-to-treat (ITT) population, with calculations performed using Rothmann's, Wang's and Ratio tests, and GPV-based method. The p-values obtained for Rothmann's, Wang's and Ratio tests, and GPV-based method are displayed in Table 3. The results indicate that the p-values obtained for the GPV-based method are uniformly lower than those obtained for the other three tests. Therefore, the GPV-based method is uniformly more powerful than the other three tests in this case.

Conclusions and discussion
In this study, the GPV-based method is proposed for the construction of a procedure for testing the fraction retention NI hypothesis under normality assumption and heteroskedasticity. As a whole, through this study, under the different values of b and b/r, the results of empirical simulations reveal the GPV-based method to be able to exhibit adequate type I error rate control at the nominal level. The performance of the empirical power from the GPV-based method is also better than that of ratio, Rothmann's and Wang's tests. In addition, the GPVbased method is more concise in calculating the type I error rate and the empirical power than the ratio test. Therefore, based on the results of this study under normality assumption, the GPV-based method is suitable for recommendation to the problem about the fraction retention NI hypothesis test. Consequently, this is the reason why the concepts of GPV have been successfully applied in many situations including this study. A R program for computation of the GPV-based method is available from S3 Appendix. Moreover, the GPV-based method is worth to apply under the normality assumptions, but may not be suitable for other probability models. The GPV-based method under other probability distribution assumptions needs further research in the future. However, one should note that under the normality assumption, the required percentiles of GPQ for m NI m H cannot be obtained in closed form, but can be estimated using Monte Carlo methods. Hence, the GPV-based method may less likely be used to estimate the optimal sample sizes.