Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Sample size determination for a specific region in multiregional clinical trials with multiple co-primary endpoints

  • Wong-Shian Huang,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan, Institute of Population Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan

  • Hui-Nien Hung,

    Roles Conceptualization, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan

  • Toshimitsu Hamasaki,

    Roles Conceptualization, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation National Cerebral and Cardiovascular Center, Osaka, Japan

  • Chin-Fu Hsiao

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    chinfu@nhri.org.tw

    Affiliations Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan, Institute of Population Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan

Abstract

Recently, multi-regional clinical trials (MRCTs), which incorporate subjects from many countries/regions around the world under the same protocol, have been widely conducted by many global pharmaceutical companies. The objective of such trials is to accelerate the development process for a drug and shorten the drug’s approval time in key markets. Several statistical methods have been purposed for the design and evaluation of MRCTs, as well as for assessing the consistency of treatment effects across all regions with one primary endpoint. However, in some therapeutic areas (e.g., Alzheimer’s disease), the clinical efficacy of a new treatment may be characterized by a set of possibly correlated endpoints, known as multiple co-primary endpoints. In this paper, we focus on a specific region and establish three statistical criteria for evaluating consistency between the specific region and overall results in MRCTs with multiple co-primary endpoints. More specifically, two of those criteria are used to assess whether the treatment effect in the region of interest is as large as that of the other regions or of the regions overall, while the other criterion is used to assess the consistency of the treatment effect of the specific region achieving a pre-specified threshold. The sample size required for the region of interest can also be evaluated based on these three criteria.

Introduction

Recently, global drug development has attracted much attention from pharmaceutical companies. Unlike traditional clinical trials, the design of MRCT recruiting subjects from many countries around the world under the same protocol has led to a new strategy for drug development. This kind of design has been widely adopted by global pharmaceutical companies, which seek simultaneous drug development, submission, and regulatory approval throughout key world markets to hasten the market availability of the drug, as well as improved patient access to new and innovative treatments. However, a key issue for conducting MRCTs is how to demonstrate the efficacy of a drug in all participating regions while also evaluating the possibility of applying the overall trial results to each region. To address the difficulties related to global drug development, in 1998 the International Conference on Harmonization (ICH) published “Ethnic Factors in the Acceptability of Foreign Clinical Data”, known as the E5 guideline. The idea of an MRCT was first raised in the 11th Q& A of E5 [1]. In recent years, the trend for simultaneous clinical development in the world has been rapidly rising. To establish a framework for how to demonstrate the efficacy of a drug in all participating regions while also evaluating the possibility of applying the overall trial results to each region by conducting an MRCT, the ICH released the draft E17 guideline “General principle on planning/designing Multi-Regional Clinical Trials” [2] in 2016 to describe general principles for the planning and the design of MRCTs; another aim of the work was to increase the acceptability of MRCTs in global regulatory submissions.

The Japanese Ministry of Health, Labour and Welfare issued its own guidance document on MRCTs, “Basic Principles on Global Clinical Trials” [3]. This guidance provided two methods as examples to determine the number of Japanese subjects required for establishing consistency in treatment effect between the Japanese group and the entire group. Let DJapan and DAll represent the observed treatment effects for the Japanese group and the entire group. Method 1 in the Japanese guidance suggests that the sample size for Japan should fulfill

On the other hand, suppose that an MRCT will be conducted in three regions. Let Di represent the observed treatment effect for region i, i = 1,…,3. For Method 2, the sample size should be determined to satisfy

Note that the Japanese guidance requires that π is 0.5 or greater and that γ be 0.8 or greater.

Several different statistical approaches based on Methods 1 and 2 in the Japanese guidance have been developed. Quan et al. [4] calculated the sample size required for Japan in an MRCT with normal, binary, and survival endpoints based on Method 1. Kawai et al. [5] proposed an approach, based on Method 2, to allocate the total sample size to the regions so that a high probability of observing a consistent trend under the assumed treatment effect across regions can be obtained. In addition, consistency criteria different from those of the Japan guidance have been established, such as those by Tsou et al. [6], Uesaka [7], Ko et al. [8], and Tsou et al. [9]. On the other hand, Chen et al. [10] and Huang et al. [11] considered ethnic differences and proposed methods that apply different treatment effects across regions to the design and evaluation of MRCTs.

However, most recent approaches to the design and evaluation of MRCTs are concerned with only one primary endpoint. In some therapeutic areas the clinical efficacy of a new treatment may be characterized by a set of possibly correlated endpoints, because there may be several different aspects to patients’ responses to that treatment. For example, a typical clinical trial for Alzheimer’s disease (AD) is usually conducted with cognitive, functional, and global endpoints to evaluate a symptomatic improvement in the dementia caused by the disease; the Committee for Medicinal Products for Human Use (CHMP) [12] and the Food and Drug Administration (FDA) [13] have recommended the two co-primary endpoints of these three in the development of drugs for the treatment of AD, where clinical trials with “co-primary” endpoints are designed to evaluate if the effect of a test treatment is superior (or non-inferior) to the control on all primary endpoints. Failure to demonstrate superiority on any single endpoint implies that superiority to the control treatment cannot be concluded. These endpoints are classified as follows:

  1. objective cognitive tests, e.g., the AD Assessment Scale cognitive subscale(ADAS-cog) and Severe Impairment Battery (SIB);
  2. self-care and activities of daily living, e.g., the AD Cooperative Study Activities of Daily Living (ADCS-ADL) and its modified version for severe AD; and
  3. global assessment of change, such as the Clinician’s Interview Based Impression of Change-plus (CIBIC-plus) and the Clinical Global Impression of Improvement (CGI-I).

Having such multiple endpoints raises difficulties for statisticians in handling multiplicity in the design and analysis of clinical trials, specifically controlling Type I and Type II error rates when the endpoints are potentially correlated. When designing a trial to evaluate joint effects on all endpoints, as seen in AD clinical trials, no adjustment is needed to control the Type I error rate. However, the Type II error rate increases as the number of endpoints to be evaluated increases. This situation is referred to as “multiple co-primary endpoints” and it is related to the intersection-union problem (Hung and Wang [14]; Offen et al. [15]).In many such trials, the sample size is often unnecessarily large, which results in complications. To overcome the issue, recently many authors have discussed approaching the design and analysis of co-primary endpoints trials using fixed-sample (size) design; the extensive references in Offen et al. [15] and Sozu et al. [16] provide many examples.

In this paper, we will focus on the design and evaluation of an MRCT with multiple co-primary endpoints. As we know, the aim of an MRCT is to show the efficacy of a drug in various global regions, and concurrently to evaluate the possibility of applying the overall trial results to each region. Therefore, we will also consider the determination of the number of subjects in a specific region to establish the consistency of treatment effects between the specific region and the entire group.

This paper is organized as follows. In section 2, we demonstrate the sample size calculation for multiple endpoints with correlation. In section 3, we established three criteria to assess the consistency of treatment effects between a specific region and the entire group in MRCTs with multiple endpoints. Under each criterion, the sample size required for the region of interest is also evaluated. An example is provided in section 4. Discussions are given in section 5.

Material and methods

Sample size calculation

For simplicity, we focus on a most fundamental situation, where an MRCT is designed to evaluate superiority over a placebo control on K(≥2)continuous multiple co-primary efficacy endpoints, and the effect size for each co-primary endpoint is assumed to be uniform across M(≥2) regions. Consequently, we can let Xikj and Yikl be efficacy responses on the kth co-primary endpoint for the jth subject and for the lth subject in the ith region receiving the test product and the placebo control, respectively, i = 1,…,M, j = 1,…, NiT, l = 1,…,NiC, and k = 1,…,K. Let Xij = (Xi1j, Xi2j,…, XiKj)T and Yil = (Yi1l, Yi2l,…, YiKl)T be the outcome vectors of K co-primary endpoints for the jth subject and the lth subject in the ith region receiving the test product and the placebo control, respectively, j = 1,…, NiT, l = 1,…,NiC.

Since the effect size for each co-primary endpoint is uniform across regions, we can therefore assume that Xij and Yil have multivariate normal (MVN) distributions with population mean vectors and , respectively, and a known common covariance matrix Σ = (ρkk′σkσk′), where (akk) denotes the matrix whose (k,k′)th element is akk, ρkk′ = corr(Xikj, Xikj) = corr(Yikl, Yikl), kk′, and Here, we assume that the outcome variances are known, although in actual practice, they are usually unknown and must be estimated from some data. Let for k = 1,…K. Here a higher value of the population mean for each co-primary endpoint represents a better outcome. Consequently, the hypothesis testing for multiple co-primary endpoints is given as (1)

The null hypothesis H0 can be conveniently expressed as a union of a family of hypotheses. The hypothesis for each co-primary endpoint is tested at the same significance level of α with H0kk ≤ 0 vs. HAkk > 0, and the null hypothesis H0 is rejected if and only if each null hypotheses H0k is rejected, so that the hypothesis testing for multiple co-primary endpoints is a test of the significance level of α. Although the hypothesis is one-sided, the proposed method can be straightforwardly extended to the two-sided hypothesis. Let for k = 1,…K. Also let for k = 1,…K. Subsequently, we will reject H0 at α level of significance if where z1‒α is the 100(1-α) percentile of the standardized normal distribution.

Let and In the design stage we assume equally sized groups, i.e., NT = NC = N. Let Z = (Z1,…,ZK)T. Then, under H1, z is distributed as an MVN with mean vector and covariance matrix ρ = (ρkk), where δ = (Δ1/σ1,…,ΔK/σK)T.

Using the result in Sozu et al. [17,18], the power for rejecting the null hypothesis H0 can be written as

This power is referred to as “conjunctive power” (Senn and Bretz [19]) or “complete power” (Westfall et al. [20]). The sample size required for achieving the desired power of 1 −β at the significance level of α for the one-sided test can be found by the minimum N that satisfies (2) where represents the density of MVN with mean and covariance matrix ρ corresponding to z1,…,zK. An iterative procedure is required to find the required sample size. The easiest way is a grid search to increase N gradually until the power under n exceeds the desired power of 1 − β, where the maximum value of the sample sizes separately calculated for each endpoint can be used as the initial values for sample size calculation. However, this often takes much computing time. To improve the speed of the sample size calculation, Sugimoto et al. [21] and Hamasaki et al.[22] provide more efficient and practical algorithms for calculating the sample sizes. Also note that since the effect size for each co-primary endpoint is assumed to be uniform across regions, there is no difference between sample size calculations for clinical trials with co-primary endpoints conducted in multiple regions and sample size calculations for clinical trials with co-primary endpoints conducted in a single region.

Applying the results of the MRCT to a specific region

The ICH E17 says that MRCTs should investigate not only consistency in treatment effects across populations but also treatment effects in overall populations. That is, the aim of an MRCT is to show the efficacy of a drug in various global regions, and concurrently to evaluate the possibility of applying the overall trial results to each region. Suppose that we are interested in judging whether a treatment is effective in a specific region, say the sth region, where 1 ≤ sM. For the kth co-primary endpoint, let Dik be the observed mean difference in the ith region, be the observed mean difference from regions other than the sth region, and Dk be the observed mean difference from all regions. That is, and

Given that the overall result is significant at the α level, we establish the following criteria to judge whether the treatment is effective in the sth region:

  1. Ds1 > γ1D1,…,DsK > γKDK for 0 < γi < 1, i = 1,…,K;
  2. for 0 < γi < 1, i = 1,…,K;
  3. Ds1 > h1,…,DsK > hK for hi > 0, i = 1,…,K.

Here, we can see that the first two criteria are to evaluate (i) whether the treatment effect in the region of interest is as large as that of the regions overall and (ii) of the other regions. Note that Criterion (i) assures that the estimated efficacy within a specific region is not smaller than a pre-specified portion of the global effect estimator.

When the sample size for the specific region is sufficiently large, the overall results will be dominated by the specific region. In this case, consistency is easier to be claimed with Criterion (i) than Criterion (ii). Therefore, Criterion (ii) tends to be more conservative than Criterion (i).

It should be noted that Criterion (i) is similar to Method 1 in the Japanese guidance. As indicated by Ikeda and Bretz [23], despite observing better results in both the entire population and the specific subpopulation, consistency sometimes can not be claimed with Method 1. This similar undesirable characteristics also exist for our Criterion (ii). Therefore, Ikeda and Bretz [23] suggested an alternative to Method 1. Let ps denote the proportion of patients out of 2N in the sth region. If we set for given values ϕi, Criterion (iii) is similar to the alternative method established by Ikeda and Bretz [23]. Here ϕi can be thought of as the desired significant level for performing a hypothesis test for comparing the test product and the placebo control for the ith endpoint within patients from the specific region.

Sample size determination for a specific region

In the design stage, once N has been determined, special consideration should be placed on the determination of the number of subjects from the specific region in the MRCT. Per ICH E17, one important issue for conducting MRCTs is that the sample size allocation of regions should be determined such that clinically meaningful differences in treatment effects among regions can be described. Since analyses of the data from a specific region in the MRCT may not have enough statistical power, the number of subjects required for the specific region should be large enough to establish the consistency of treatment effects between the specific region and the regions overall. In this regard, ICH E17 has provided five approaches that can be considered for allocating the overall sample size to regions. Briefly, the first approach is to determine the regional sample sizes such that similar trends in treatment effects across regions can be demonstrated. The second approach is to determine the sample size needed in one or more regions such that the region-specific treatment effect preserves some pre-specified proportion of the overall treatment effect. The third approach is to enrol subjects in proportion to region size. The fourth approach is to determine the regional sample sizes so that significant results within one or more regions can be achieved. The last approach is to require a fixed minimum number of subjects in one or more regions.

In this section we suggest that, similar to the second approach suggested by ICH E17, the selected sample size should satisfy that the assurance probability of the consistency criterion in (i), (ii), or (iii), given that δ and the overall result is significant at the α level, is maintained at a desired level, say 80%.

Let pi denote the proportion of patients out of 2N in the ith region, i = 1,…,M, where . Also let Ni be the number of patients per group in the ith region. That is, Ni = piN. The assurance probabilities of Criteria (i)–(iii), given δ, can be represented by (3) and

Where Pδ is the probability measure with respect to δ. Here we need to determine ps to ensure that the assurance probabilities of Criteria (i)–(iii) given δ are maintained at a desired level, say 80%. These assurance probabilities can be directly calculated by some standard normal distributions through some algebra changes; the details of the derivations of AP1AP3 are given in S1 and S2 Files.

Results and discussion

Required sample sizes and assurance probabilities

Without loss of generality, we assume that we want to see whether the overall results can apply to the first region, i.e. s = 1. To illustrate our approach, let K = 2 and assume that (Δ1, Δ2) = (3,0.45) and that (σ1,σ2) = (6,1). That is, δ = (0.5, 0.45)T. By considering α = 0.025, β = 0.1, (γ1, γ2) = (0.5, 0.5), and ϕ1 = ϕ2 = ϕ, Tables 14 exhibit the total sample size required per group and the assurance probabilities of Criteria (i)–(iii) for ρ12 = 0.1, 0.3, 0.5, and 0.7 with various values of p1, respectively. In Table 1, the total sample size required per group for the MRCT would be 117, which is calculated from formulas (1) and (2), for ρ12 = 0.1. The first line in Table 1 indicates that if the proportion of patients out of the total number of patients in the study is 0.10, the assurance probabilities of Criteria (i) and (ii) are respectively 0.55, 0.53, while the assurance probabilities for criteria (iii) with corresponding to ϕ = 0.15 and ϕ = 0.30 are respectively 0.33 and 0.57. From Table 1, to achieve assurance probability at the 80% level, the sample size for the first region has to be around 40% of the overall sample size for criteria (i), and to be around 60% for criterion (ii). On the other hand, the assurance probabilities of Criterion (iii) will reach 80% when the values of p1 are 40% and 30% for ϕ = 0.15 and ϕ = 0.30 respectively. Note that the sample size required per group is the minimum N satisfying (1) and (2); therefore, the assurance probabilities for criteria (i), (ii), and (iii) must increase more if the sample size in a practical trial is larger than N.

thumbnail
Table 1. Sample size and assurance probabilities for observing criteria (i), (ii), and (iii) given α = 0.025, β = 0.1, (Δ1, Δ2) = (3,0.45), (σ1, σ2) = (6,1), (γ1, γ2) = (0.5, 0.5), and ρ12 = 0.1.

https://doi.org/10.1371/journal.pone.0180405.t001

thumbnail
Table 2. Sample size and assurance probabilities for observing criteria (i), (ii), and (iii) given α = 0.025, β = 0.1, (Δ1, Δ2) = (3,0.45), (σ1, σ2) = (6,1), (γ1, γ2) = (0.5, 0.5), and ρ12 = 0.3.

https://doi.org/10.1371/journal.pone.0180405.t002

thumbnail
Table 3. Sample size and assurance probabilities for observing criteria (i), (ii), and (iii) given α = 0.025, β = 0.1, (Δ1, Δ2) = (3,0.45), (σ1, σ2) = (6,1), (γ1, γ2) = (0.5, 0.5), and ρ12 = 0.5.

https://doi.org/10.1371/journal.pone.0180405.t003

thumbnail
Table 4. Sample size and assurance probabilities for observing criteria (i), (ii), and (iii) given α = 0.025, β = 0.1, (Δ1, Δ2) = (3,0.45), (σ1, σ2) = (6,1), (γ1, γ2) = (0.5, 0.5), and ρ12 = 0.7.

https://doi.org/10.1371/journal.pone.0180405.t004

In Tables 14, we see the following phenomena. First of all, we found that as p1 increases, the assurance probability of Criterion (i) increases. This is due to the fact that as p1 increases, the observed overall results Dk’s will be increasingly dominated by the observed result from the first region, D1k’s. Secondly, we have also observed that as p1 increases, the assurance probability of Criterion (ii) increases first and then decrease later. This occurs because the observed result from regions other than the first region, ‘s, is gradually dominated by D1k’s at first and is then completely dominated by D1k’s later as p1 increases. Also, the assurance probability of Criterion (iii) increases when p1 increases, since the hi’s decrease as p1 increases.

Another feature we observed is that AP1 > AP2 in Tables 1, 2, 3 and 4, given p1 and γk. This is due to the fact that since

Like Ikeda and Bretz [23] suggested, for Criterion (iii), it may be able to link the choice of ϕ to (γ1, γ2) = (0.5, 0.5) in order to ensure the same level of strictness of Criterion (i). For example, in Table 1, setting ϕ = 0.17 in Criterion (iii) would closely ensure a similar level of strictness as Criterion (i) with p1 = 0.4. Another point we wish to make is that the assurance probabilities of all criteria increase as ρ12 increases. This makes intuitive sense because these two co-primary endpoints look more alike.

Numerical example

In this section, we provide an example to illustrate a practical application of our method. A randomized, double-blind, active-controlled MRCT will be conducted in patients with mild to moderate AD for comparing a new treatment and a placebo control. In this trial, patients age 50 or older with a diagnosis of uncomplicated AD are planned to be recruited from three regions: Taiwan, the European Union, and the United States. The primary endpoints are the change from baseline of ADAS-cog at week 24 and the CIBIC plus value at week 24. Based on the results observed in a previous exploratory study, the differences of change in ADAS-cog score from the baseline and the CIBIC-plus at week 24 between the test drug and placebo are expected to be 2.88 and 0.44, respectively. Also the standard deviations for both groups for change in ADAS-cog score from the baseline and the CIBIC-plus at week 24 are respectively equal and are assumed to be 6.15 and 0.92. With ρ12 = 0, 0.3, 0.5, 0.8, α = 0.025, and β = 0.1, the sample sizes required per group determined by (1) and (2) are as follows:

In addition, in order to demonstrate an overall treatment effect from all regions the sponsor is also interested in assessing whether the overall results from the multi-regional trial can be bridged to Taiwan if the overall treatment effect shows statistical significance. In this regard, the proportion of the patients recruited in Taiwan needs to be determined during the design phase of the trial to preserve the probability of establishing consistency between Taiwan and all other regions. Suppose that similarity criterion (i) is used, and that γ1 = 0.5 and γ2 = 0.5 are chosen. To insure the assurance probability of AP1 at the 80% level, the sample sizes required per group, nS, from Taiwan patients with respect to ρ12 = 0, 0.3, 0.5, and 0.8 are shown below:

Conclusions

The aim of an MRCT is to show the efficacy of a drug in various global regions, and simultaneously to evaluate the possibility of applying the overall trial results to each region. However, in MRCTs sponsors are challenged by how to demonstrate consistency between a specific region and the overall results. In this paper, three criteria have been established to assess the similarity between a specific region and the overall regions in an MRCT with multiple co-primary endpoints. Regulators and sponsors can easily adopt these criteria to conduct statistical assessments of the consistency of treatment effects between the specific region and the entire trial, and consequently to help registration of the new drug in the specific region.

On the other hand, the 11th Q&A for ICH E5 states, “It may be desirable in certain situations to achieve the goal of bridging by conducting a multi-regional trial under a common protocol that includes sufficient numbers of patients from each of multiple regions to reach a conclusion about the effect of the drug in all regions.” Therefore, the sample size determination for each region is another challenge for regulators and sponsors. With the three criteria we established, the sample size required for a specific region can easily be determined so that there is a high probability of observing a consistent trend in treatment effect between the specific region and the entire MRCT. In this paper, we do not particularly recommend any criterion for evaluating the consistency of treatment effects between the entire region and the specific region.

Although our approach is easy to use, the selection of the magnitude γi’s consistency trend raises an important issue. In this regard, the Japanese guidance suggests that the magnitude be 0.5 or greater for the first criterion when the number of primary endpoints for the MRCT is only one. Our suggestion is that the determination of γi should be discussed between the regulatory agency in the specific region and the trial sponsor. Most importantly, all differences in race, diet, environment, culture, and medical practice among regions should be considered.

It should be noted that, in our approach, the sample size calculation for the specific region did not have a closed-form expression. For conducting an MRCT with only one primary endpoint, Ikeda and Bretz [23] discussed the methods proposed in the Japanese regulatory guidance document and derived closed-form expressions for the resulting probabilities, which required the evaluation of multivariate normal or t probabilities between the overall effect and the effect in Japan. In addition, they proposed a different method of calculating the probability of observing a consistent trend based on Method 1 in the Japanese regulatory guidance. Ikeda and Bretz’s work is worthy of being extended to the MRCT with multiple co-primary endpoints.

When more than one primary endpoint is viewed as important in a clinical trial, a decision must be made as to whether it is desirable to evaluate the joint effects on at least one or even all of the endpoints. This decision defines the alternative hypothesis to be tested and provides a framework for trial design. This article discusses only the former situation, where a trial is designed to evaluate the joint effects of a new treatment compared to any control treatment on all of the primary endpoints as seen in AD clinical trials. On the other hand, the latter situation—i.e., designing the trial to evaluate an effect on at least one of the primary endpoints is referred to as “multiple primary endpoints” (Offen et al. [15])—and many methods for dealing with such multiple primary endpoints have been proposed (e.g., see the extensive references in Dmitrienko et al [24]). Similarly, as in multiple co-primary endpoints, the power for detecting an effect on at least one endpoint—which is called “disjunctive power” (Senn and Bretz [19]) or “minimal power” (Westfall et al. [20]))—can be defined and extended.

Another issue we want to point out is that in this paper, it is assumed that the outcome variances are known for the sample size calculation. In actual practice, the outcome variances are not known and should be estimated from some data. In fact, extensive literature of results of similar trials may exist, and thus the variability associated with the primary endpoints can also be found in literature. For methods for unknown variance, the major change is that the power function will be evaluated based on a non-central multivariate t-distribution. For clinical trials with multiple co-primary endpoints, Sozu et al. [18] discussed a method for the unknown variance case and showed that the calculated sample size is nearly equivalent to that for the known variance in the setting of 80% or 90% power at 2.5% significance level for one-sided test. They showed that the sample size per group calculated using the method based on the unknown variance needs generally one more subject than that using the method based on the known variance. This is a very similar result observed as in a single primary endpoint case. Therefore, sample size calculation based on a known variance provides a reasonable approximation for the unknown variances case.

Similarly, the correlation is usually unknown and thus must be estimated by (1) using data from pilot studies or proceeding clinical trials (e.g., Phase II trials), or by (2) borrowing information from external existing data when incorporating correlation into sample size calculation. In some disease areas, the correlations among the endpoints have been known. For example, Offen et al. [15] provides a list of known disease are as that the regulatory agency requires for co-primary endpoints when evaluating the effects of a new treatment; the list includes possible correlations among endpoints for each disease area.

The proposed criteria can be extended from one to multiple regions. For example, after the MRCT has demonstrated a statistically significant overall treatment effect, we can bridge the results of the MRCT to all regions if Here represents the threshold of consistency trend for the kth endpoint in the ithregion. Our research work here assumes that the effect size for each co-primary endpoint and the correlations among endpoints are both uniform across regions. Since MRCTs recruit subjects from many countries around the world, it might be expected that there is a difference in treatment effect or in correlations among endpoints due to regional difference (e.g., ethnic difference).Thus, the sample size calculation for MRCTs based on the assumption that the effect size for each co-primary endpoint and the correlations among endpoints are uniform across regions might be impractical. Future work is being pursued to address this issue.

Supporting information

S1 File. Derivations of the three assurance probabilities AP1, AP2, and AP3.

https://doi.org/10.1371/journal.pone.0180405.s001

(PDF)

S2 File. Codes of R software for AP1, AP2, and AP3.

https://doi.org/10.1371/journal.pone.0180405.s002

(PDF)

Acknowledgments

The work in this paper was supported by a grant from the Ministry of Science and Technology, Taiwan (MOST103-2118-M-400-004).Thanks are due to two referees for his or her detailed, constructive, and thoughtful comments and suggestions, which we believe have led to significant improvements to this paper.

References

  1. 1. International Conference on Harmonization. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Q&A for the ICH E5 Guideline on Ethnic Factors in the Acceptability of Foreign Data. 2006. http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E5_R1/Q_As/E5_Q_As__R5_.pdf. Accessed on May 3, 2017.
  2. 2. International Conference on Harmonization. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH E17 Guideline on General principle on planning/designing Multi-Regional Clinical Trials. 2016. http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E17/E17_Step2.pdf. Accessed on May 3, 2017.
  3. 3. Ministry of Health, Labour and Welfare of Japan. Basic Principles on Global Clinical Trials. 2007. http://www.pmda.go.jp/files/000153265.pdf. Accessed on May 3, 2017.
  4. 4. Quan H, Zhao PL, Zhang J, Roessner M, Aizawa K. Sample size considerations for Japanese patients in a multi‐regional trial based on MHLW guidance. Pharmaceutical Statistics. 2010;9(2):100–112. pmid:19499510
  5. 5. Kawai N, Chuang-Stein C, Komiyama O, Li Y. An approach to rationalize partitioning sample size into individual regions in a multiregional trial. Drug Information Journal. 2008;42(2):139–147.
  6. 6. Tsou HH, James Hung HM, Chen YM, Huang WS, Chang WJ, Hsiao CF. Establishing consistency across all regions in a multi‐regional clinical trial. Pharmaceutical Statistics. 2012; 11(4): 295–299. pmid:22504851
  7. 7. Uesaka H. Sample size allocation to regions in a multiregional trial. Journal of Biopharmaceutical Statistics. 2009; 19(4): 580–594. pmid:20183427
  8. 8. Ko FS, Tsou HH, Liu JP, Hsiao CF. Sample size determination for a specific region in a multiregional trial. Journal of Biopharmaceutical Statistics. 2010; 20(4): 870–885. pmid:20496211
  9. 9. Tsou HH, Chien TY, Liu JP, Hsiao CF. A consistency approach to evaluation of bridging studies and multi‐regional trials. Statistics in Medicine. 2011; 30(17): 2171–2186. pmid:21590701
  10. 10. Chen CT, Hung HMJ, Hsiao CF. Design and evaluation of multiregional trials with heterogeneous treatment effect across regions. Journal of Biopharmaceutical Statistics. 2012; 22(5): 1037–1050. pmid:22946948
  11. 11. Huang Y, Chang WJ, Hsiao CF. An empirical Bayes approach to evaluation of results for a specific region in multiregional clinical trials. Pharmaceutical Statistics. 2013; 12(2): 59–64. pmid:23319408
  12. 12. Committee for Medicinal Products for Human Use (CHMP). Guideline on medicinal products for the treatment of Alzheimer’s disease and other dementias (CPMP/EWP/553/95 Rev.1). European Medical Agency, London, UK. 2008. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003562.pdf. Accessed on May 3, 2017.
  13. 13. Food and Drug Administration. Guidance for Industry. Alzheimer’s disease: developing drugs for the treatment of early stage disease. Center for Drug Evaluation and Research, Food and Drug Administration, Rockville, MD, USA. 2013. http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm338287.pdf. Accessed on May 3, 2017.
  14. 14. Hung HM, Wang SJ. Some controversial multiple testing problems in regulatory applications. Journal of Biopharmaceutical Statistics. 2009; 19:1–11. pmid:19127460
  15. 15. Offen W, Chuang-Stein C, Dmitrienko A, Littman G, Maca J, Meyerson L, et al. Multiple co-primary endpoints: medical and statistical solutions. Drug Information Journal. 2007;41:31–46.
  16. 16. Sozu T, Sugimoto T, Hamasaki T, Evans SR. Sample size determination in clinical trials with multiple endpoints. Cham: Springer International Publishing; 2015.
  17. 17. Sozu T, Kanou T, Hamada C, Yoshimura I. Power and sample size calculations in clinical trials with multiple primary variables. Japanese Journal of Biometrics. 2006; 27:83–96.
  18. 18. Sozu T, Sugimoto T, Hamasaki T. Sample size determination in superiority clinical trials with multiple co-primary correlated endpoints. Journal of Biopharmaceutical Statistics.2011; 21:650–668. pmid:21516562
  19. 19. Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharmaceutical Statistics. 2007; 161–170. pmid:17674404
  20. 20. Westfall PH, Tobias RD, Wolfinger RD. Multiple comparisons and multiple tests using SAS,2nd edition, 2011. Cary, NC: SAS Institute Inc.
  21. 21. Sugimoto T, Sozu T, Hamasaki T. A convenient formula for sample size calculations in clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012; 11:118–128. pmid:22415870
  22. 22. Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: exponential event-times. Pharmaceutical Statistics. 2013; 12:28–34. pmid:23081932
  23. 23. Ikeda K, Bretz F. Sample size and proportion of Japanese patients in multi-regional trials. Pharmaceutical Statistics. 2010; 207–216. pmid:20872621
  24. 24. Dmitrienko A, Tamhane AC, Bretz F. Multiple testing problems in pharmaceutical statistics. 2010, Chapman & Hall/CRC, Boca Raton