Figures
Abstract
The consideration of individual equivalence provides an essential alternative to average equivalence in two-group comparative studies. A common procedure for declaring individual equivalence adopts the tolerance intervals of the designated proportions of measurement differences. This statistical practice is a direct generalization of the widely used two one-sided tests (TOST) for average equivalence. Such TOST extensions often do not have adequate control of Type I error and result in excessively conservative tests. To signify and resolve the underlying issues of existing methods, this paper presents exact tests for assessing individual equivalence between two treatments under parallel group and crossover designs. Rigorous evaluations are conducted to clarify the discrepancy of critical values and Type I error probabilities between the equivalence procedures. The findings elucidate the shortcoming of the TOST technique and the advantage of the proposed approach. The associated power and sample size calculations are also justified through simulation studies.
Citation: Shieh G (2022) Assessing individual equivalence in parallel group and crossover designs: Exact test and sample size procedures. PLoS ONE 17(5): e0269128. https://doi.org/10.1371/journal.pone.0269128
Editor: Paul Aurelian Gagniuc, University Politehnica of Bucharest, ROMANIA
Received: November 19, 2021; Accepted: May 15, 2022; Published: May 27, 2022
Copyright: © 2022 Gwowen Shieh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by a grant from the Ministry of Science and Technology (MOST 109-2410-H-009-021-MY2). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The two one-sided tests (TOST) procedure of mean equivalence, first described by Schuirmann [1] and Westlake [2], is the most common method in equivalence methodology. The conceptual simplicity and technical feasibility of TOST provide an important reform to apply appropriate statistical tools for equivalence, rather than relying on failure to reject the conventional hypothesis of no difference between treatment effects. Meyners [3] presented a comprehensive review of different types of equivalence tests. Moreover, Hauschke, Steinijans, and Pigeot [4], Chow and Liu [5], Wellek [6], and Choudhary and Nagaraja [7] discussed the concepts and techniques for the design and analysis of equivalence studies. The TOST for mean equivalence focuses on the mean parameters of the target populations and represents a vital method within the general scope of average equivalence. It is important to note that mean equivalence testing specifies only the population mean difference and does not concern the other characteristics associated with the underlying distribution of measurement differences. Accordingly, the principle of average equivalence only demands similar average bioavailability and does not guarantee equivalence in intra-subject variability and closeness of the response distribution between the test and reference formulations.
In view of the practical issue and important problem about the interchangeability of bioequivalence drug products, the notion of individual equivalence has been proposed to ensure switchability when a large proportion of individuals need to be sufficiently similar on the two drug formulations. The basic concept and rationale of individual equivalence are described in Anderson and Hauck [8], Hauck and Anderson [9], Sheiner [10], Schall and Luus [11], and Anderson [12]. Various individual equivalence principles and techniques have been proposed to evaluate exchangeability or switchability in terms of the desired proportion of the subject-level differences between two formulations. In particular, the commonly used reference limits of 95% proportion encompass the 2.5th percentile and 97.5th percentile for the distribution of measurement differences. Accordingly, the normal percentile is a linear function of the mean and standard deviation of the designated population. Statistical procedures and theoretical investigations of normal percentiles are essential for assessing individual equivalence.
For the mean equivalence appraisals considered in the TOST, the duality between decision rules and confidence intervals is well documented. Specifically, the null hypothesis of no mean equivalence is rejected if and only if the confidence limits of the corresponding equal-tail two-sided 100(1–2α)% confidence interval of mean difference are contained in the designated equivalence bounds. Within the context of individual equivalence, the target parameters are the lower and upper percentiles for describing the desired population proportion. It is appealing to apply the confidence interval procedure to the normal percentiles of the distribution of subject-level differences. The one-sided confidence intervals of normal percentiles have a close link to the one-sided tolerance bounds of a normal distribution. This technical correspondence reveals that tolerance interval estimation has an extended utility in assessing individual bioequivalence. The notion of confidence intervals for mean equivalence or average equivalence has been extended to the appraisals of individual equivalence, such as the TOST methods presented in Esinhart and Chinchilli [13], Liu and Chow [14], and Tsong and Shen [15], among others. Accordingly, tolerance intervals are constructed for the desired proportions of measurement differences and individual equivalence is claimed if the resulting interval limits are within the selected equivalence range. General discussions of tolerance interval estimation are available in Krishnamoorthy and Mathew [16] and Meeker, Hahn, and Escobar [17].
Due to the close resemblance between tolerance intervals and confidence intervals, the TOST method for assessing individual equivalence is presumed to share the same desirable properties of the counterpart TOST for establishing mean equivalence. However, Berger and Hsu [18] showed that size-α bioequivalence tests do not generally correspond to 100(1–2α)% confidence sets. It is strongly advocated in Berger and Hsu [18] that statistically sound techniques should be employed to derive a test with the specified Type I error rate. Notably, the prescribed TOST methods for individual equivalence were conducted with respect to tolerance interval estimation. The corresponding numerical results did not directly evaluate their Type I error control in hypothesis testing. Although the assessment of individual equivalence mainly focuses on biopharmaceutical applications, the concept and analysis are pertinent to comparative studies across virtually all scientific disciplines. It is of great interest to clarify the potential deficiency and implications of current methods in equivalence testing.
Following the two-sided sampling plan in Owen [19], this article presents a unified approach for evaluating individual equivalence between two treatment formulations. Exact test procedures are described for the parallel group and crossover designs. Extensive numerical investigations are conducted to demonstrate the underlying features the suggested and TOST procedures. The comparisons and findings reveal their essential discrepancy on critical values and Type I error rates that have not been addressed in the literature. The results update the less-recognized problems of the current TOST methods for examining individual equivalence in Liu and Chow [14], and Tsong and Shen [15]. To enhance the usefulness of the proposed approach, the associated power and sample size calculations are also demonstrated for planning individual equivalence studies. Computer algorithms for computing the critical value, statistical power, and sample size of the suggested test procedures are available as supplemental material. It should be noted that Owen [19] did not address hypothesis testing, power analysis and sample size determination for appraising individual equivalence. Moreover, the technical arguments presented here are more analytically transparent than the formulation based on the bivariate noncentral t distribution in Owen [19].
Methods
Parallel group design
Consider independent random samples from two normal populations with the following formulations:
(1)
where μi, σ2 are unknown parameters, j = 1, …, Ni, and i = 1 and 2. To establish individual equivalence between two treatments, the central portion of the difference between the individual measurements of two treatments X1j–X2j′ needs to lie within a reasonable range around zero. The 100·pth percentile of the distribution N(μD,
) of X1j–X2j′ is denoted by
(2)
where μD = μ1– μ2,
= 2σ2, zp is the 100·pth percentile of the standard normal distribution N(0, 1), and 0 < p < 1. The null and alternative hypotheses of the individual equivalence test are expressed as
(3)
where p > 0.5 and the two designated constants ΔL and ΔU represent the lower and upper thresholds of the percentile range for declaring individual equivalence between two treatments. The alternative hypothesis indicates that there is at least p* = 2p – 1 central proportion of the distribution N(μD,
) in the range (ΔL, ΔU).
Unlike the individual equivalence problem concerns the central proportion of a target distribution in terms of the pair of percentiles (θ1-p, θp), a comparison of alternative approaches for difference, noninferiority, and equivalence testing of a single normal percentile was presented in Shieh [20]. Similar to the widely used TOST for mean equivalence, Shieh [20] showed that the TOST procedure for the comparability of a designated percentile also maintains good control the Type I error rate at the specified value. These promising results suggest that TOST principle can be useful for similar problems in more advanced designs and complex scenarios. However, a critical exposition of the TOST extensions for individual equivalence is presented to demonstrate that such generalizations do not have adequate control of Type I error and result in overly conservative tests.
The TOST procedure for parallel group design.
To demonstrate average equivalence between two treatment means, the TOST procedure rejects the null hypothesis of incomparability if the ordinary 100(1–2α)% equal-tailed confidence interval of mean difference is entirely included in the equivalence range. The same principle was extended to individual equivalence assessment for exchangeability between the test and standard treatments in Tsong and Shen [15]. A concise illustration is presented to simplify the complicated results in Tsong and Shen [15].
The usual two-sample t statistic has the form
where
,
, M = 1/(1/N1 + 1/N2), S2 = {(N1−1)
+ (N2−1)
}/v,
,
, and v = N1 + N2−2. The ordinary interval limits (
,
) of a 100(1–2α)% equal-tailed confidence interval of μD are
(4)
respectively, where tv,1−α is the 100(1 – α)th percentile of the t distribution with degrees of freedom v. In addition to the practical usefulness for interval estimation, the range {
,
} has an interesting connection to equivalence assessment. A well-known simple approach to conduct the TOST for mean equivalence is by examining whether the 100(1–2α)% confidence interval (
,
) of μD falls within the designated range (δL, δU) where δL and δU are a priori constant and represent the sensible bounds for declaring mean equivalence.
It is straightforward to show that the pivotal quantity for θ1-p has a noncentral t distribution
(5)
where t(v, zp(2M)1/2) is a noncentral t distribution with degrees of freedom ν and noncentrality zp(2M)1/2. The exact lower confidence limit of an upper 100(1 – α)% one-sided confidence interval {
, ∞} of θ1−p can be obtained as
(6)
where τT = t1−α(v, zp(2M)1/2) is the 100(1 – α)th percentile of a noncentral t distribution t(v, zp(2M)1/2). Similarly, the pivotal quantity for θp is distributed as
(7)
Using the important property of a noncentral distribution as in Johnson, Kotz, and Balakrishnan [21, Chapter 31] that t1−α(v, zp(2M)1/2) = −tα(v, −zp(2M)1/2) for 0 < α < 1, the exact upper confidence limit of a lower 100(1 – α)% two-sided confidence interval {–∞, } of θp can be expressed as
(8)
Note that the one-sided confidence intervals of normal percentiles are technically identical to the one-sided tolerance bounds of a normal distribution as noted in Hahn [22, 23]. The derived confidence limits and
assure that P{
< θ1−p < ∞} = P{P[
< (X1j–X2j′) | (
–
, S2)] > p} = 1 – α and P{–∞ < θp <
} = P{P[(X1j–X2j′) <
| (
–
, S2)] > p} = 1 – α, respectively. Accordingly, for p > 0.5, a lower 100(1 – α)% confidence limit for the 100(1–p)-th percentile θ1−p is equivalent to a lower tolerance limit to be exceeded by at least a proportion p of the population with probability 1 – α. Likewise, an upper 100(1 – α)% confidence limit for the 100 p-th percentile θp for p > 0.5 is equivalent to an upper tolerance limit to exceed at least a proportion p of the population with probability 1 – α.
As an extension to the use of tolerance intervals for the assessment of individual bioequivalence, Tsong and Shen [15] suggested that the null hypothesis H0: θ1−p ≤ ΔL or ΔU ≤ θp is rejected if
(9)
or
(10)
The strong resemblance between (,
) and {
,
} in formulation and testing suggests that the rejection region {
,
} for individual equivalence may possess similar statistical properties with the confidence interval (
,
) for mean equivalence. Specifically, the TOST of mean equivalence based on (
,
) adequately controls the Type I error rate at the specified value. However, Berger and Hsu [18] exemplified that an equivalence procedure in terms of a 100(1–2α)% confidence interval can lead to a liberal or conservative test. The Type I error rate associated with the TOST of individual equivalence is evaluated by αTOST = P{τT < TL and TU < −τT} when the boundary values (θ1−p, θp) = (ΔL, ΔU). It follows from ΔL = θ1−p = μD−zpσD and ΔU = θp = μD + zpσD that TL ~ t(v, zp(2M)1/2) and TU = TL− 2zpσD/(S2/M)1/2. Thus, the Type I error rate is rewritten as αTOST = P{τT < TL and TU < −τT} = P{τT < TL < 2zpσD/(S2/M)1/2 – τT} ≤ P{τT < TL} = α. Note that the size of the TOST is the supremum
= α which is attained as
or σ2 goes to zero. However, the Type I error rate of the TOST procedure is generally less than the nominal level. The succeeding empirical investigations reveal that the discrepancy is of considerable concern. An improved procedure is proposed next to facilitate research practice in assessing individual equivalence.
The proposed procedure for parallel group design.
By extending the two-sided sampling plan in Owen [19], the suggested exact rejection region for declaring individual equivalence is of the form
(11)
where
,
, and the quantity τE is selected so that the Type I error rate
= α. Note that the supremum
is attained when the two percentiles coincide the boundary values (θ1−p, θp) = (ΔL, ΔU) or alternatively, μD = (ΔU + ΔL)/2 and
= (ΔU− ΔL)2/(4
). Accordingly, the designated critical value τE is obtained by
(12)
It follows from the normal assumption defined in Eq 1 that Z = ( –
– μD)/(σ2/M)1/2 ~ N(0, 1) and K = vS2/σ2 ~ χ2(v) where χ2(v) is a chi-square distribution with degrees of freedom v. Also, Z and K are independent. Then, the probability evaluation in Eq 12 can be expressed as
(13)
where G = zp(2M)1/2 – τE(K/v)1/2. It is computationally transparent to adopt the formulation
(14)
where G0 = G if K < k0 and G0 = 0 if K ≥ k0 with k0 = (2vM
)/
, Φ is the cumulative density function of the standard normal distribution, and the expectation EK[·] is taken with respect to the distribution of K. A special-purpose computer program is required to calculate the critical value τE for the chosen model settings. Consequently, the null hypothesis is rejected if
(15)
Note that the critical values τE of the suggested approach and τT of the TOST procedure generally differ. For example, when (N1, N2) = (20, 20), α = 0.05, p* = 0.80, the critical values are τE = 6.4527 and τT = 7.9987 for the suggested and TOST procedures, respectively. According to the rejection rules in Eqs 10 and 15, the TOST is less likely to reject the null hypothesis than the exact procedure because of τT > τE. Therefore, the two critical regions (,
) and (
,
) do not necessarily lead to the same conclusion.
On the other hand, with the definitions of the two random variables Z and K, it can be shown that the corresponding power function is
(16)
where GL = (ΔL− μD)/(σ2/M)1/2 + τE(K/v)1/2 and GU = (ΔU− μD)/(σ2/M)1/2 – τE(K/v)1/2. Note that the power calculation is meaningful only when GL < GU or K < k1 where k1 = {vM(ΔU− ΔL)2}/(4
). A transparent and convenient expression of the power function is
(17)
where GL1 = GL and GU1 = GU if K < k1, and GL1 = 0 and GU1 = 0 if K ≥ k1. The power formula ΨE is useful for computing the achieved power with the given sample sizes, and for determining the required sample sizes to attain the nominal power under the selected configurations (ΔL, ΔU, p*, α, μ1, μ2, σ2).
Crossover design
In bioequivalence studies, a common scenario for comparing treatments is the two-period crossover design. Consider the standard two-sequence and two-period crossover design in terms of the model
(18)
where Yijk is the outcome for the kth subject in the ith sequence and jth period, μ is the grand mean, Fij is the formulation effect, Pj is the fixed period effect, Sik is the random subject effect, and εijk is the random error for i = 1 and 2, j = 1 and 2, and k = 1, …, Ni. The formulation effects are expressed as F11 = F22 = μR and F12 = F21 = μT for the reference product and test product, respectively, {Sik} are independent N(0,
) variables, and {εijk} are independent N(0,
) variables with
=
=
and
=
=
. Moreover, it is assumed that P1 + P2 = μR + μT = 0.
To establish individual equivalence between two treatments in the crossover design, the central portion of the contrast for the individual measurements of two treatments (C1k–C2k′)/2 needs to be within a reasonable range around zero where Cik = (Yi2k–Yi1k)/2 for i = 1 and 2. Accordingly, (C1k–C2k′) ~ N(μC, ) where μC = μT− μR,
= 2σ2, and σ2 = (
+
)/4. The 100·pth percentile for the distribution of (C1k–C2k′) is denoted by
(19)
for 0 < p < 1 as in Eq 2. An unbiased estimator of the difference between the two treatments μC is the sample mean difference
where
=
for i = 1 and 2. It is clear that E(
) = μC/2, E(
) = –μC/2, Var(
) = σ2/N1, and Var(
) = σ2/N2. Hence, the mean difference
–
has the distribution
where M = 1/(1/N1 + 1/N2). Moreover, S2 =
is an unbiased estimator of σ2 and K = (vS2)/σ2 has a chi-square distribution with degrees of freedom v = N1 + N2−2. The formulations and properties for the crossover design show close resemblance to those of the parallel group design. Accordingly, the conceptual and statistical similarities enable the conversion of the individual equivalence inference of the parallel group design into that of the crossover design.
The TOST procedure for crossover design.
By analogy to the parallel group design, the individual equivalence problem within the context of crossover design can be conducted with respect to the null and alternative hypotheses given in Eq 3. Following the TOST principle for assessing equivalence of mean effects, Liu and Chow [14] proposed an extension for declaring individual equivalence based on the lower confidence limit of a upper 100(1 – α)% one-sided confidence interval of θ1−p and the upper confidence limit of a lower 100(1 – α)% one-sided confidence interval of θp. Specifically, Liu and Chow [14] suggested that the null hypothesis of no individual equivalence is rejected if
(20)
or
(21)
where
,
and the critical value τCT = τT = t1−α(v, zp(2M)1/2).
The proposed procedure for crossover design.
In this case of crossover design, the proposed exact rejection region for declaring individual equivalence is of the form
(22)
where
,
, and the quantity τCE is selected so that the Type I error rate
= α. This evaluation of the Type I error rate has the same statistical property as that of the parallel group design. The critical value can be obtained with the identical technique. Consequently, with the similar argument and notation, it can be shown that the critical value τCE is identical to that of the parallel group design: τCE = τE. Alternatively, the null hypothesis is rejected if
(23)
The corresponding power function is
(24)
where GCL = (ΔL− μC)/(σ2/M)1/2 + τCE(K/v)1/2, GCU = (ΔU− μC)/(σ2/M)1/2 – τCE(K/v)1/2, Z ~ N(0, 1), and K ~ χ2(v). For computational ease, an alternative formulation of ΨCE is
(25)
where GCL1 = GCL and GCU1 = GCU if K < kC1, GCL1 = 0 and GCU1 = 0 if K ≥ kC1, kC1 = {vM(ΔU− ΔL)2}/(
). For ease of illustration, the endpoints of the prescribed test procedures for parallel group and crossover designs are summarized in Table 1.
Results
Type I errors
The suggested test procedures are derived by controlling the Type I error at the nominal level. Although the critical values do not have an explicit analytic expression, they can be determined with the designated configurations (N1, N2, p*, α, ΔL, ΔU). On the other hand, the TOST procedures generalize the results for mean equivalence assessment and tolerance interval estimation. The resulting critical values and rejection regions are not directly obtained with respect to the Type I error control in hypothesis testing. It is of theoretical and practical importance to evaluate the potential discrepancy between the proposed approach and benchmark TOST method. Accordingly, simulation study was conducted to examine the Type I error rates under the parallel group designs.
For the numerical investigations, the selected central proportions of the individual equivalence tests are p* = 0.80, 0.90 and 0.95. The mean and variance of the null distribution N(μD0, ) for the individual measurement difference are chosen as μD0 = 0 and
= 1. The designated thresholds (ΔL, ΔU) are determined by ΔL = μD0–zpσD0 and ΔU = μD0 + zpσD0. The resulting similarity bounds are (ΔL, ΔU) = (–1.2816, 1.2816), (–1.6449, 1.6449), and (–1.9600, 1.9600) for p = 0.90, 0.95, and 0.975, respectively. Four sets of sample sizes are considered: (N1, N2) = (20, 20), (50, 50), (100, 100), and (200, 200). Throughout the empirical examination, the significance level is fixed as α = 0.05. Under the combined twelve structures of central proportions and sample sizes, an important step is to compute the critical values τE and τT of the proposed and TOST procedures for the specified settings. According to the results presented in Table 2, the two critical values have a systematic order that τE is consistently less than τT. Hence, the TOST method has smaller rejection rate than the suggested approach.
The simulated Type I error rates of the individual equivalence tests were computed via Monte Carlo simulation of 10,000 independent data sets. For the two test procedures, the simulated Type I error rates were the proportion of the 10,000 replicates whose critical intervals (,
) and (
,
) were within the range of (ΔL, ΔU). The simulated Type I error probabilities under the four different sample sizes are summarized in Tables 3–5 for the three central portions p* = 0.80, 0.90, and 0.95, respectively. The adequacy of the two procedures is determined by the difference between the simulated Type I error rate and the nominal level 0.05 as summarized in the tables. To visualize the differences between the two procedures, the simulated results for p* = 0.90 in Table 4 are also plotted in Fig 1. It is evident that the simulated Type I error rates of the suggested approach are almost identical to the nominal value 0.05. In contrast, the simulated Type I error probabilities of the TOST method are less than 0.01 for the 12 settings considered here. These findings suggest that the proposed procedure has adequate Type I error control, whereas the TOST procedure is extremely conservative.
Power and sample size calculations
A related and important issue of the individual equivalence test is the power and sample size calculations. The power functions derived in Eqs 17 and 25 facilitate the desired power and sample size planning of the parallel group and crossover designs. The algorithms for computing the critical value, achieved power, and sample size are implemented in the supplementary programs. Accordingly, numerical studies were conducted to explicate the behavior of derived power function and the usefulness of accompanying computer algorithm in sample size determinations.
Sample size determination requires test configurations of Type I error rate α, nominal power 1 – β, equivalence bounds (ΔL, ΔU), null central portion p*, and the alternative settings include the mean values (μ1, μ2), error variance σ2, and sample size allocation ratio r = N2/N1. Note that the resulting percentiles θ1−p and θp need to be within the designated bounds (ΔL, ΔU) under the alternative distribution N(μD, ). For illustration, two central portions are considered: p* = 0.90 and 0.95 (p = 0.95 and 0.975). By fixing the null distribution N(μD,
) as N(0, 1), the resulting two sets of threshold bounds are (ΔL, ΔU) = (–1.6449, 1.6449), and (–1.9600, 1.9600). The alternative distributions are chosen to have the treatment means (μ1, μ2) = (0, 0), (0.05, 0), and (0.10, 0), and variance
= 0.6, 0.7 and 0.8. Under the specified configurations, the minimum total sample size NT = N1 + N2 is computed for balanced design r = 1 (N1 = N2), significance level α = 0.05, and nominal power 1 – β = 0.9. The estimated sample sizes and attained power levels are summarized in Table 6 for the combined 18 cases. The minimum sample size for attaining the nominal power increases with increasing mean difference μD or increasing variance
when all other factors remain fixed. It is essential to see that the magnitudes of the computed sample sizes are substantially different for the settings considered here. The smallest sample size is 80 for two the settings of (p*, μD,
) = (0.95, 0, 0.6). On the other hand, the largest sample size 1852 is required for the situation with (p*, μD,
) = (0.90, 0.10, 0.8). The results indicate that the prescribed test configurations have unique and distinct influence on the power function. Conceivably, it is unlikely that a simple guideline will give accurate sample size determination.
Furthermore, under the prescribed model configurations, simulation study was conducted to justify the accuracy of the proposed power and sample size procedures. Specifically, the simulated power of the proposed test procedure was computed via Monte Carlo simulation of 10,000 independent data sets. The simulated power and the difference between the simulated power and estimated power are also presented in Table 6. For each of the 18 scenarios, the small difference reveals that the simulated power is nearly identical to the estimated power. The accuracy of the described power and sample size procedures is fairly consistent under various sample size and parameter configurations. Consequently, these findings suggest that the developed power and sample size algorithms are reliable for practical applications.
An application
A bioequivalence study was presented in Liu and Chow [14] to demonstrate the assessment of individual equivalence between two drug formulations. Under the standard setting of two-sequence two-period cross over design, the responses are the area under the plasma concentration-time curve (AUC). The sample sizes, sample mean difference, and residual error variance of the logarithmic transformation of AUC are N1 = N2 = 10, –
= 0.05331, and S2 = 0.0378, respectively. To declare individual equivalence between the test and reference formulations, it is assumed that at least p* = 0.75 of the difference between two individual formulation measurements are within the bounds ΔL = ln(0.80) = –0.2231 and ΔU = ln(1.25) = 0.2231. Accordingly, the test statistics in Eq 21 can be computed as TCL = 3.1801 and TCU = –1.9537. With α = 0.05, the critical values of the TOST and proposed procedures are τCT = 6.0173 and τCE = 4.3436, respectively. Also, the two associated critical regions are (
,
) = (–0.4698, 0.5764) and (
,
) = (–0.3243, 0.4309). Thus, the two test procedures conclude that the null hypothesis of no individual equivalence cannot be rejected at the significance level 0.05.
Under the normal assumptions, the difference between two individual formulation measurements has the distribution (C1k–C2k′) ~ N(μC, ). Using the summary statistics as exemplifying parameter values (μC,
) = (0.05331, 0.0756), the proportion between the two bounds (ΔL, ΔU) = (–0.2231, 0.2231) for the normal distribution N(μC,
) is the probability P(ΔL < C1k–C2k′ < ΔU) = 0.5744. Note that the coverage probability is substantially less than the nominal value 0.75 for declaring individual equivalence. For illustration, the working parameters are chosen as μC = 0.02, 0.03, 0.04, and 0.05 and
= 0.0756/4. To meet the nominal power 0.80, the estimated sample sizes are (N1, N2) = (25, 25), (37, 37), (69, 69), and (183, 183) with the achieved power levels 0.8017, 0.8035, 0.8024, and 0.8002, respectively. Evidently, the magnitudes are larger than the sample sizes (N1, N2) = (10, 10) of the previous analysis. This indicates the importance and accuracy of power and sample size procedures for efficient computations in individual equivalence study. The accompanying computer algorithms are also presented for conducting the suggested power and sample size calculations.
Conclusions
The conventional TOST of mean focuses only on the equivalence of population means between the test and reference formulations. Therefore, the TOST of mean equivalence or average equivalence does not take into account the variability of formulation difference in bioavailability across subjects. In view of the limitation of average equivalence, Chen [24] identified several desirable features of bioequivalence criteria. The criteria include the assurance of switchability between formulations, the control of Type I error rate at 5%, determination of appropriate sample size, and user-friendly software application for the statistical method. Related considerations of individual equivalence can be found in the additional discussion in Chen et al. [25] and Chen and Lesko [26]. To address these issues, this article presents exact tests for assessing individual equivalence under parallel group and crossover designs. The numerical results showed that the TOST procedures based on tolerance intervals are overly conservative. More importantly, the exact approach has excellent Type I error control and can be recommended for routine use. Computer programs are also developed to implement the proposed equivalence test, power calculation, and sample size determination. The research designs and test procedures considered here are valid only if the homogeneous variance assumption is satisfied. The degree of robustness presumably depends on the extent of how badly the homogeneity of variance assumption is violated. Future research can explore possible extensions to accommodate heterogeneity of variance settings.
Supporting information
S1 File. SAS/IML programs for performing the suggested procedures.
https://doi.org/10.1371/journal.pone.0269128.s001
(PDF)
S2 File. R programs for performing the suggested procedures.
https://doi.org/10.1371/journal.pone.0269128.s002
(PDF)
References
- 1. Schuirmann D. L. (1981). On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics, 37, 617.
- 2. Westlake W. J. (1981). Response to T.B.L. Kirkwood: Bioequivalence testing |a need to rethink. Biometrics, 3, 589–594.
- 3. Meyners M. (2012). Equivalence tests-A review. Food Quality and Preference, 26, 231–245.
- 4.
Hauschke D., Steinijans V., & Pigeot I. (2007). Bioequivalence studies in drug development: Methods and applications. Chichester: John Wiley & Sons.
- 5.
Chow S. C., & Liu J. P. (2008). Design and analysis of bioavailability and bioequivalence studies (3rd ed.). New York, NY: Chapman & Hall/CRC.
- 6.
Wellek S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). New York, NY: CRC Press.
- 7.
Choudhary P. K., & Nagaraja H. N. (2017). Measuring agreement: Models, methods, and applications. Hoboken, NJ: John Wiley & Sons.
- 8. Anderson S., & Hauck W. W. (1990). Consideration of individual bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics, 18, 259–273. pmid:2380920
- 9. Hauck W. W., & Anderson S. (1992). Types of bioequivalence and related statistical considerations. International Journal of Clinical Pharmacology, Therapy and Toxicology, 30, 181–187. pmid:1592546
- 10. Sheiner L. B. (1992). Bioequivalence revisited. Statistics in Medicine, 11, 1777–1788. pmid:1485060
- 11. Schall R., & Luus G. H. (1993). On population and individual bioequivalence. Statistics in Medicine, 12, 1109–1124. pmid:8210816
- 12. Anderson S. (1993). Individual bioequivalence: A problem of switchability (with discussion). Biopharmaceutical Report, 2, 1–11.
- 13. Esinhart J. D., & Chinchilli V. M. (1994). Extension to the use of tolerance intervals for the assessment of individual bioequivalence. Journal of Biopharmaceutical Statistics, 4, 39–52. pmid:8019583
- 14. Liu J. P., & Chow S. C. (1997). A two one-sided tests procedure for assessment of individual bioequivalence. Journal of Biopharmaceutical Statistics, 7, 49–61. pmid:9056588
- 15. Tsong Y., & Shen M. (2007). An alternative approach to assess exchangeability of a test treatment and the standard treatment with normally distributed response. Journal of Biopharmaceutical Statistics, 17, 329–338. pmid:17365227
- 16.
Krishnamoorthy K., & Mathew T. (2009). Statistical tolerance regions: Theory, applications, and computation (Vol. 744). New York, NY: Wiley.
- 17.
Meeker W. Q., Hahn G. J., & Escobar L. A. (2017). Statistical intervals: A guide for practitioners and researchers. Hoboken, NJ: Wiley.
- 18. Berger R. L., & Hsu J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets (with discussion). Statistical Science, 11, 283–319.
- 19. Owen D. B. (1965). A special case of a bivariate non-central t-distribution. Biometrika, 52, 437–446.
- 20. Shieh G. (2020). Comparison of alternative approaches for difference, noninferiority, and equivalence testing of normal percentiles. BMC Medical Research Methodology, 20, 59. pmid:32169043
- 21.
Johnson N. L., Kotz S., & Balakrishnan N. (1995). Continuous univariate distributions (2nd ed., Vol. 2). New York, NY: Wiley.
- 22. Hahn G. J. (1970). Statistical intervals for a normal population, Part I. Tables, examples and applications. Journal of Quality Technology, 2, 115–125.
- 23. Hahn G. J. (1970). Statistical intervals for a normal population, Part II. Formulas, assumptions, some derivations. Journal of Quality Technology, 2, 195–206.
- 24. Chen M. L. (1997). Individual bioequivalence-A regulatory update. Journal of Biopharmaceutical Statistics, 7, 5–11. pmid:9056581
- 25. Chen M. L., Patnaik R., Hauck W. W., Schuirmann D. J., Hyslop T., Williams R. (2000). An individual bioequivalence criterion: Regulatory considerations. Statistics in Medicine, 19, 2821–2842. pmid:11033578
- 26. Chen M. L., & Lesko L. J. (2001). Individual bioequivalence revisited. Clinical pharmacokinetics, 40, 701–706. pmid:11707058