## Figures

## Abstract

Recently, multi-regional clinical trials (MRCTs), which incorporate subjects from many countries/regions around the world under the same protocol, have been widely conducted by many global pharmaceutical companies. The objective of such trials is to accelerate the development process for a drug and shorten the drug’s approval time in key markets. Several statistical methods have been purposed for the design and evaluation of MRCTs, as well as for assessing the consistency of treatment effects across all regions with one primary endpoint. However, in some therapeutic areas (e.g., Alzheimer’s disease), the clinical efficacy of a new treatment may be characterized by a set of possibly correlated endpoints, known as multiple co-primary endpoints. In this paper, we focus on a specific region and establish three statistical criteria for evaluating consistency between the specific region and overall results in MRCTs with multiple co-primary endpoints. More specifically, two of those criteria are used to assess whether the treatment effect in the region of interest is as large as that of the other regions or of the regions overall, while the other criterion is used to assess the consistency of the treatment effect of the specific region achieving a pre-specified threshold. The sample size required for the region of interest can also be evaluated based on these three criteria.

**Citation: **Huang W-S, Hung H-N, Hamasaki T, Hsiao C-F (2017) Sample size determination for a specific region in multiregional clinical trials with multiple co-primary endpoints. PLoS ONE 12(6):
e0180405.
https://doi.org/10.1371/journal.pone.0180405

**Editor: **Tim Friede, University Medical Center Gottingen, GERMANY

**Received: **July 24, 2016; **Accepted: **June 15, 2017; **Published: ** June 30, 2017

**Copyright: ** © 2017 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper and its Supporting Information files.

**Funding: **The work in this paper was supported by the grant MOST 103-2118-M-400-004- from the Ministry of Science and Technology, Taiwan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Recently, global drug development has attracted much attention from pharmaceutical companies. Unlike traditional clinical trials, the design of MRCT recruiting subjects from many countries around the world under the same protocol has led to a new strategy for drug development. This kind of design has been widely adopted by global pharmaceutical companies, which seek simultaneous drug development, submission, and regulatory approval throughout key world markets to hasten the market availability of the drug, as well as improved patient access to new and innovative treatments. However, a key issue for conducting MRCTs is how to demonstrate the efficacy of a drug in all participating regions while also evaluating the possibility of applying the overall trial results to each region. To address the difficulties related to global drug development, in 1998 the International Conference on Harmonization (ICH) published “Ethnic Factors in the Acceptability of Foreign Clinical Data”, known as the E5 guideline. The idea of an MRCT was first raised in the 11th Q& A of E5 [1]. In recent years, the trend for simultaneous clinical development in the world has been rapidly rising. To establish a framework for how to demonstrate the efficacy of a drug in all participating regions while also evaluating the possibility of applying the overall trial results to each region by conducting an MRCT, the ICH released the draft E17 guideline “General principle on planning/designing Multi-Regional Clinical Trials” [2] in 2016 to describe general principles for the planning and the design of MRCTs; another aim of the work was to increase the acceptability of MRCTs in global regulatory submissions.

The Japanese Ministry of Health, Labour and Welfare issued its own guidance document on MRCTs, “Basic Principles on Global Clinical Trials” [3]. This guidance provided two methods as examples to determine the number of Japanese subjects required for establishing consistency in treatment effect between the Japanese group and the entire group. Let *D*_{Japan} and *D*_{All} represent the observed treatment effects for the Japanese group and the entire group. Method 1 in the Japanese guidance suggests that the sample size for Japan should fulfill

On the other hand, suppose that an MRCT will be conducted in three regions. Let *D*_{i} represent the observed treatment effect for region *i*, *i* = 1,…,3. For Method 2, the sample size should be determined to satisfy

Note that the Japanese guidance requires that π is 0.5 or greater and that γ be 0.8 or greater.

Several different statistical approaches based on Methods 1 and 2 in the Japanese guidance have been developed. Quan et al. [4] calculated the sample size required for Japan in an MRCT with normal, binary, and survival endpoints based on Method 1. Kawai et al. [5] proposed an approach, based on Method 2, to allocate the total sample size to the regions so that a high probability of observing a consistent trend under the assumed treatment effect across regions can be obtained. In addition, consistency criteria different from those of the Japan guidance have been established, such as those by Tsou et al. [6], Uesaka [7], Ko et al. [8], and Tsou et al. [9]. On the other hand, Chen et al. [10] and Huang et al. [11] considered ethnic differences and proposed methods that apply different treatment effects across regions to the design and evaluation of MRCTs.

However, most recent approaches to the design and evaluation of MRCTs are concerned with only one primary endpoint. In some therapeutic areas the clinical efficacy of a new treatment may be characterized by a set of possibly correlated endpoints, because there may be several different aspects to patients’ responses to that treatment. For example, a typical clinical trial for Alzheimer’s disease (AD) is usually conducted with cognitive, functional, and global endpoints to evaluate a symptomatic improvement in the dementia caused by the disease; the Committee for Medicinal Products for Human Use (CHMP) [12] and the Food and Drug Administration (FDA) [13] have recommended the two co-primary endpoints of these three in the development of drugs for the treatment of AD, where clinical trials with “co-primary” endpoints are designed to evaluate if the effect of a test treatment is superior (or non-inferior) to the control on all primary endpoints. Failure to demonstrate superiority on any single endpoint implies that superiority to the control treatment cannot be concluded. These endpoints are classified as follows:

- objective cognitive tests, e.g., the AD Assessment Scale cognitive subscale(ADAS-cog) and Severe Impairment Battery (SIB);
- self-care and activities of daily living, e.g., the AD Cooperative Study Activities of Daily Living (ADCS-ADL) and its modified version for severe AD; and
- global assessment of change, such as the Clinician’s Interview Based Impression of Change-plus (CIBIC-plus) and the Clinical Global Impression of Improvement (CGI-I).

Having such multiple endpoints raises difficulties for statisticians in handling multiplicity in the design and analysis of clinical trials, specifically controlling Type I and Type II error rates when the endpoints are potentially correlated. When designing a trial to evaluate joint effects on all endpoints, as seen in AD clinical trials, no adjustment is needed to control the Type I error rate. However, the Type II error rate increases as the number of endpoints to be evaluated increases. This situation is referred to as “multiple co-primary endpoints” and it is related to the intersection-union problem (Hung and Wang [14]; Offen et al. [15]).In many such trials, the sample size is often unnecessarily large, which results in complications. To overcome the issue, recently many authors have discussed approaching the design and analysis of co-primary endpoints trials using fixed-sample (size) design; the extensive references in Offen et al. [15] and Sozu et al. [16] provide many examples.

In this paper, we will focus on the design and evaluation of an MRCT with multiple co-primary endpoints. As we know, the aim of an MRCT is to show the efficacy of a drug in various global regions, and concurrently to evaluate the possibility of applying the overall trial results to each region. Therefore, we will also consider the determination of the number of subjects in a specific region to establish the consistency of treatment effects between the specific region and the entire group.

This paper is organized as follows. In section 2, we demonstrate the sample size calculation for multiple endpoints with correlation. In section 3, we established three criteria to assess the consistency of treatment effects between a specific region and the entire group in MRCTs with multiple endpoints. Under each criterion, the sample size required for the region of interest is also evaluated. An example is provided in section 4. Discussions are given in section 5.

## Material and methods

### Sample size calculation

For simplicity, we focus on a most fundamental situation, where an MRCT is designed to evaluate superiority over a placebo control on *K*(≥2)continuous multiple co-primary efficacy endpoints, and the effect size for each co-primary endpoint is assumed to be uniform across *M*(≥2) regions. Consequently, we can let *X*_{ikj} and *Y*_{ikl} be efficacy responses on the *k*th co-primary endpoint for the *j*th subject and for the *l*th subject in the *i*th region receiving the test product and the placebo control, respectively, *i* = 1,…,*M*, *j* = 1,…, *N*_{i}^{T}, *l* = 1,…,*N*_{i}^{C}, and *k* = 1,…,*K*. Let **X**_{ij} = (*X*_{i1j}, *X*_{i2j},…, *X*_{iKj})^{T} and **Y**_{il} = (*Y*_{i1l}, *Y*_{i2l},…, *Y*_{iKl})^{T} be the outcome vectors of *K* co-primary endpoints for the *j*th subject and the *l*th subject in the *i*th region receiving the test product and the placebo control, respectively, *j* = 1,…, *N*_{i}^{T}, *l* = 1,…,*N*_{i}^{C}.

Since the effect size for each co-primary endpoint is uniform across regions, we can therefore assume that **X**_{ij} and **Y**_{il} have multivariate normal (MVN) distributions with population mean vectors and , respectively, and a known common covariance matrix **Σ** = (*ρ*_{kk′}*σ*_{k}*σ*_{k′}), where (*a*_{kk′}) denotes the matrix whose (*k*,*k*′)^{th} element is *a*_{kk′}, *ρ*_{kk′} = *corr*(*X*_{ikj}, *X*_{ik′j}) = *corr*(*Y*_{ikl}, *Y*_{ik′l}), *k* ≠ *k*′, and Here, we assume that the outcome variances are known, although in actual practice, they are usually unknown and must be estimated from some data. Let for *k* = 1,…*K*. Here a higher value of the population mean for each co-primary endpoint represents a better outcome. Consequently, the hypothesis testing for multiple co-primary endpoints is given as
(1)

The null hypothesis H_{0} can be conveniently expressed as a union of a family of hypotheses. The hypothesis for each co-primary endpoint is tested at the same significance level of *α* with H_{0k}:Δ_{k} ≤ 0 vs. H_{Ak}:Δ_{k} > 0, and the null hypothesis **H**_{0} is rejected if and only if each null hypotheses **H**_{0k} is rejected, so that the hypothesis testing for multiple co-primary endpoints is a test of the significance level of *α*. Although the hypothesis is one-sided, the proposed method can be straightforwardly extended to the two-sided hypothesis. Let
for *k* = 1,…*K*. Also let
for *k* = 1,…*K*. Subsequently, we will reject H_{0} at *α* level of significance if
where *z*_{1‒α} is the 100(1-*α*) percentile of the standardized normal distribution.

Let and In the design stage we assume equally sized groups, i.e., *N*^{T} = *N*^{C} = *N*. Let **Z** = (*Z*_{1},…,*Z*_{K})^{T}. Then, under H_{1}, **z** is distributed as an MVN with mean vector and covariance matrix **ρ** = (*ρ*_{kk′}), where **δ** = (Δ_{1}/*σ*_{1},…,Δ_{K}/*σ*_{K})^{T}.

Using the result in Sozu et al. [17,18], the power for rejecting the null hypothesis H_{0} can be written as

This power is referred to as “conjunctive power” (Senn and Bretz [19]) or “complete power” (Westfall et al. [20]). The sample size required for achieving the desired power of 1 −*β* at the significance level of *α* for the one-sided test can be found by the minimum *N* that satisfies
(2)
where represents the density of MVN with mean and covariance matrix **ρ** corresponding to *z*_{1},…,*z*_{K}. An iterative procedure is required to find the required sample size. The easiest way is a grid search to increase *N* gradually until the power under n exceeds the desired power of 1 − *β*, where the maximum value of the sample sizes separately calculated for each endpoint can be used as the initial values for sample size calculation. However, this often takes much computing time. To improve the speed of the sample size calculation, Sugimoto et al. [21] and Hamasaki et al.[22] provide more efficient and practical algorithms for calculating the sample sizes. Also note that since the effect size for each co-primary endpoint is assumed to be uniform across regions, there is no difference between sample size calculations for clinical trials with co-primary endpoints conducted in multiple regions and sample size calculations for clinical trials with co-primary endpoints conducted in a single region.

### Applying the results of the MRCT to a specific region

The ICH E17 says that MRCTs should investigate not only consistency in treatment effects across populations but also treatment effects in overall populations. That is, the aim of an MRCT is to show the efficacy of a drug in various global regions, and concurrently to evaluate the possibility of applying the overall trial results to each region. Suppose that we are interested in judging whether a treatment is effective in a specific region, say the *s*th region, where 1 ≤ *s* ≤ *M*. For the *k*th co-primary endpoint, let *D*_{ik} be the observed mean difference in the *i*th region, be the observed mean difference from regions other than the *s*th region, and *D*_{k} be the observed mean difference from all regions. That is,
and

Given that the overall result is significant at the α level, we establish the following criteria to judge whether the treatment is effective in the *s*th region:

*D*_{s1}>*γ*_{1}*D*_{1},…,*D*_{sK}>*γ*_{K}*D*_{K}for 0 <*γ*_{i}< 1,*i*= 1,…,*K*;- for 0 <
*γ*_{i}< 1,*i*= 1,…,*K*; *D*_{s1}>*h*_{1},…,*D*_{sK}>*h*_{K}for*h*_{i}> 0,*i*= 1,…,*K*.

Here, we can see that the first two criteria are to evaluate (i) whether the treatment effect in the region of interest is as large as that of the regions overall and (ii) of the other regions. Note that Criterion (i) assures that the estimated efficacy within a specific region is not smaller than a pre-specified portion of the global effect estimator.

When the sample size for the specific region is sufficiently large, the overall results will be dominated by the specific region. In this case, consistency is easier to be claimed with Criterion (i) than Criterion (ii). Therefore, Criterion (ii) tends to be more conservative than Criterion (i).

It should be noted that Criterion (i) is similar to Method 1 in the Japanese guidance. As indicated by Ikeda and Bretz [23], despite observing better results in both the entire population and the specific subpopulation, consistency sometimes can not be claimed with Method 1. This similar undesirable characteristics also exist for our Criterion (ii). Therefore, Ikeda and Bretz [23] suggested an alternative to Method 1. Let *p*_{s} denote the proportion of patients out of 2*N* in the *s*th region. If we set for given values *ϕ*_{i}, Criterion (iii) is similar to the alternative method established by Ikeda and Bretz [23]. Here *ϕ*_{i} can be thought of as the desired significant level for performing a hypothesis test for comparing the test product and the placebo control for the *i*th endpoint within patients from the specific region.

### Sample size determination for a specific region

In the design stage, once N has been determined, special consideration should be placed on the determination of the number of subjects from the specific region in the MRCT. Per ICH E17, one important issue for conducting MRCTs is that the sample size allocation of regions should be determined such that clinically meaningful differences in treatment effects among regions can be described. Since analyses of the data from a specific region in the MRCT may not have enough statistical power, the number of subjects required for the specific region should be large enough to establish the consistency of treatment effects between the specific region and the regions overall. In this regard, ICH E17 has provided five approaches that can be considered for allocating the overall sample size to regions. Briefly, the first approach is to determine the regional sample sizes such that similar trends in treatment effects across regions can be demonstrated. The second approach is to determine the sample size needed in one or more regions such that the region-specific treatment effect preserves some pre-specified proportion of the overall treatment effect. The third approach is to enrol subjects in proportion to region size. The fourth approach is to determine the regional sample sizes so that significant results within one or more regions can be achieved. The last approach is to require a fixed minimum number of subjects in one or more regions.

In this section we suggest that, similar to the second approach suggested by ICH E17, the selected sample size should satisfy that the assurance probability of the consistency criterion in (i), (ii), or (iii), given that **δ** and the overall result is significant at the α level, is maintained at a desired level, say 80%.

Let *p*_{i} denote the proportion of patients out of 2*N* in the *i*th region, *i* = 1,…,*M*, where . Also let *N*_{i} be the number of patients per group in the *i*th region. That is, *N*_{i} = *p*_{i}*N*. The assurance probabilities of Criteria (i)–(iii), given **δ**, can be represented by
(3)
and

Where *P*_{δ} is the probability measure with respect to **δ**. Here we need to determine *p*_{s} to ensure that the assurance probabilities of Criteria (i)–(iii) given **δ** are maintained at a desired level, say 80%. These assurance probabilities can be directly calculated by some standard normal distributions through some algebra changes; the details of the derivations of *AP*_{1}–*AP*_{3} are given in S1 and S2 Files.

## Results and discussion

### Required sample sizes and assurance probabilities

Without loss of generality, we assume that we want to see whether the overall results can apply to the first region, i.e. *s* = 1. To illustrate our approach, let *K* = 2 and assume that (Δ_{1}, Δ_{2}) = (3,0.45) and that (*σ*_{1},*σ*_{2}) = (6,1). That is, **δ** = (0.5, 0.45)^{T}. By considering *α* = 0.025, *β* = 0.1, (*γ*_{1}, *γ*_{2}) = (0.5, 0.5), and *ϕ*_{1} = *ϕ*_{2} = *ϕ*, Tables 1–4 exhibit the total sample size required per group and the assurance probabilities of Criteria (i)–(iii) for *ρ*_{12} = 0.1, 0.3, 0.5, and 0.7 with various values of *p*_{1}, respectively. In Table 1, the total sample size required per group for the MRCT would be 117, which is calculated from formulas (1) and (2), for *ρ*_{12} = 0.1. The first line in Table 1 indicates that if the proportion of patients out of the total number of patients in the study is 0.10, the assurance probabilities of Criteria (i) and (ii) are respectively 0.55, 0.53, while the assurance probabilities for criteria (iii) with corresponding to *ϕ* = 0.15 and *ϕ* = 0.30 are respectively 0.33 and 0.57. From Table 1, to achieve assurance probability at the 80% level, the sample size for the first region has to be around 40% of the overall sample size for criteria (i), and to be around 60% for criterion (ii). On the other hand, the assurance probabilities of Criterion (iii) will reach 80% when the values of *p*_{1} are 40% and 30% for *ϕ* = 0.15 and *ϕ* = 0.30 respectively. Note that the sample size required per group is the minimum *N* satisfying (1) and (2); therefore, the assurance probabilities for criteria (i), (ii), and (iii) must increase more if the sample size in a practical trial is larger than *N*.

In Tables 1–4, we see the following phenomena. First of all, we found that as *p*_{1} increases, the assurance probability of Criterion (i) increases. This is due to the fact that as *p*_{1} increases, the observed overall results *D*_{k}’s will be increasingly dominated by the observed result from the first region, *D*_{1k}’s. Secondly, we have also observed that as *p*_{1} increases, the assurance probability of Criterion (ii) increases first and then decrease later. This occurs because the observed result from regions other than the first region, ‘s, is gradually dominated by *D*_{1k}’s at first and is then completely dominated by *D*_{1k}’s later as *p*_{1} increases. Also, the assurance probability of Criterion (iii) increases when *p*_{1} increases, since the *h*_{i}’s decrease as *p*_{1} increases.

Another feature we observed is that *AP*_{1} > *AP*_{2} in Tables 1, 2, 3 and 4, given *p*_{1} and *γ*_{k}. This is due to the fact that
since

Like Ikeda and Bretz [23] suggested, for Criterion (iii), it may be able to link the choice of *ϕ* to (*γ*_{1}, *γ*_{2}) = (0.5, 0.5) in order to ensure the same level of strictness of Criterion (i). For example, in Table 1, setting *ϕ* = 0.17 in Criterion (iii) would closely ensure a similar level of strictness as Criterion (i) with *p*_{1} = 0.4. Another point we wish to make is that the assurance probabilities of all criteria increase as *ρ*_{12} increases. This makes intuitive sense because these two co-primary endpoints look more alike.

### Numerical example

In this section, we provide an example to illustrate a practical application of our method. A randomized, double-blind, active-controlled MRCT will be conducted in patients with mild to moderate AD for comparing a new treatment and a placebo control. In this trial, patients age 50 or older with a diagnosis of uncomplicated AD are planned to be recruited from three regions: Taiwan, the European Union, and the United States. The primary endpoints are the change from baseline of ADAS-cog at week 24 and the CIBIC plus value at week 24. Based on the results observed in a previous exploratory study, the differences of change in ADAS-cog score from the baseline and the CIBIC-plus at week 24 between the test drug and placebo are expected to be 2.88 and 0.44, respectively. Also the standard deviations for both groups for change in ADAS-cog score from the baseline and the CIBIC-plus at week 24 are respectively equal and are assumed to be 6.15 and 0.92. With *ρ*_{12} = 0, 0.3, 0.5, 0.8, *α* = 0.025, and *β* = 0.1, the sample sizes required per group determined by (1) and (2) are as follows:

In addition, in order to demonstrate an overall treatment effect from all regions the sponsor is also interested in assessing whether the overall results from the multi-regional trial can be bridged to Taiwan if the overall treatment effect shows statistical significance. In this regard, the proportion of the patients recruited in Taiwan needs to be determined during the design phase of the trial to preserve the probability of establishing consistency between Taiwan and all other regions. Suppose that similarity criterion (i) is used, and that *γ*_{1} = 0.5 and *γ*_{2} = 0.5 are chosen. To insure the assurance probability of *AP*_{1} at the 80% level, the sample sizes required per group, *n*_{S}, from Taiwan patients with respect to *ρ*_{12} = 0, 0.3, 0.5, and 0.8 are shown below:

## Conclusions

The aim of an MRCT is to show the efficacy of a drug in various global regions, and simultaneously to evaluate the possibility of applying the overall trial results to each region. However, in MRCTs sponsors are challenged by how to demonstrate consistency between a specific region and the overall results. In this paper, three criteria have been established to assess the similarity between a specific region and the overall regions in an MRCT with multiple co-primary endpoints. Regulators and sponsors can easily adopt these criteria to conduct statistical assessments of the consistency of treatment effects between the specific region and the entire trial, and consequently to help registration of the new drug in the specific region.

On the other hand, the 11th Q&A for ICH E5 states, “It may be desirable in certain situations to achieve the goal of bridging by conducting a multi-regional trial under a common protocol that includes sufficient numbers of patients from each of multiple regions to reach a conclusion about the effect of the drug in all regions.” Therefore, the sample size determination for each region is another challenge for regulators and sponsors. With the three criteria we established, the sample size required for a specific region can easily be determined so that there is a high probability of observing a consistent trend in treatment effect between the specific region and the entire MRCT. In this paper, we do not particularly recommend any criterion for evaluating the consistency of treatment effects between the entire region and the specific region.

Although our approach is easy to use, the selection of the magnitude *γ*_{i}’s consistency trend raises an important issue. In this regard, the Japanese guidance suggests that the magnitude be 0.5 or greater for the first criterion when the number of primary endpoints for the MRCT is only one. Our suggestion is that the determination of *γ*_{i} should be discussed between the regulatory agency in the specific region and the trial sponsor. Most importantly, all differences in race, diet, environment, culture, and medical practice among regions should be considered.

It should be noted that, in our approach, the sample size calculation for the specific region did not have a closed-form expression. For conducting an MRCT with only one primary endpoint, Ikeda and Bretz [23] discussed the methods proposed in the Japanese regulatory guidance document and derived closed-form expressions for the resulting probabilities, which required the evaluation of multivariate normal or *t* probabilities between the overall effect and the effect in Japan. In addition, they proposed a different method of calculating the probability of observing a consistent trend based on Method 1 in the Japanese regulatory guidance. Ikeda and Bretz’s work is worthy of being extended to the MRCT with multiple co-primary endpoints.

When more than one primary endpoint is viewed as important in a clinical trial, a decision must be made as to whether it is desirable to evaluate the joint effects on at least one or even all of the endpoints. This decision defines the alternative hypothesis to be tested and provides a framework for trial design. This article discusses only the former situation, where a trial is designed to evaluate the joint effects of a new treatment compared to any control treatment on all of the primary endpoints as seen in AD clinical trials. On the other hand, the latter situation—i.e., designing the trial to evaluate an effect on at least one of the primary endpoints is referred to as “multiple primary endpoints” (Offen et al. [15])—and many methods for dealing with such multiple primary endpoints have been proposed (e.g., see the extensive references in Dmitrienko et al [24]). Similarly, as in multiple co-primary endpoints, the power for detecting an effect on at least one endpoint—which is called “disjunctive power” (Senn and Bretz [19]) or “minimal power” (Westfall et al. [20]))—can be defined and extended.

Another issue we want to point out is that in this paper, it is assumed that the outcome variances are known for the sample size calculation. In actual practice, the outcome variances are not known and should be estimated from some data. In fact, extensive literature of results of similar trials may exist, and thus the variability associated with the primary endpoints can also be found in literature. For methods for unknown variance, the major change is that the power function will be evaluated based on a non-central multivariate t-distribution. For clinical trials with multiple co-primary endpoints, Sozu et al. [18] discussed a method for the unknown variance case and showed that the calculated sample size is nearly equivalent to that for the known variance in the setting of 80% or 90% power at 2.5% significance level for one-sided test. They showed that the sample size per group calculated using the method based on the unknown variance needs generally one more subject than that using the method based on the known variance. This is a very similar result observed as in a single primary endpoint case. Therefore, sample size calculation based on a known variance provides a reasonable approximation for the unknown variances case.

Similarly, the correlation is usually unknown and thus must be estimated by (1) using data from pilot studies or proceeding clinical trials (e.g., Phase II trials), or by (2) borrowing information from external existing data when incorporating correlation into sample size calculation. In some disease areas, the correlations among the endpoints have been known. For example, Offen et al. [15] provides a list of known disease are as that the regulatory agency requires for co-primary endpoints when evaluating the effects of a new treatment; the list includes possible correlations among endpoints for each disease area.

The proposed criteria can be extended from one to multiple regions. For example, after the MRCT has demonstrated a statistically significant overall treatment effect, we can bridge the results of the MRCT to all regions if
Here represents the threshold of consistency trend for the *k*th endpoint in the *i*thregion. Our research work here assumes that the effect size for each co-primary endpoint and the correlations among endpoints are both uniform across regions. Since MRCTs recruit subjects from many countries around the world, it might be expected that there is a difference in treatment effect or in correlations among endpoints due to regional difference (e.g., ethnic difference).Thus, the sample size calculation for MRCTs based on the assumption that the effect size for each co-primary endpoint and the correlations among endpoints are uniform across regions might be impractical. Future work is being pursued to address this issue.

## Supporting information

### S1 File. Derivations of the three assurance probabilities *AP*_{1}, *AP*_{2}, and *AP*_{3}.

https://doi.org/10.1371/journal.pone.0180405.s001

(PDF)

### S2 File. Codes of R software for *AP*_{1}, *AP*_{2}, and *AP*_{3}.

https://doi.org/10.1371/journal.pone.0180405.s002

(PDF)

## Acknowledgments

The work in this paper was supported by a grant from the Ministry of Science and Technology, Taiwan (MOST103-2118-M-400-004).Thanks are due to two referees for his or her detailed, constructive, and thoughtful comments and suggestions, which we believe have led to significant improvements to this paper.

## References

- 1.
International Conference on Harmonization. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Q&A for the ICH E5 Guideline on Ethnic Factors in the Acceptability of Foreign Data. 2006. http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E5_R1/Q_As/E5_Q_As__R5_.pdf. Accessed on May 3, 2017.
- 2.
International Conference on Harmonization. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH E17 Guideline on General principle on planning/designing Multi-Regional Clinical Trials. 2016. http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E17/E17_Step2.pdf. Accessed on May 3, 2017.
- 3.
Ministry of Health, Labour and Welfare of Japan. Basic Principles on Global Clinical Trials. 2007. http://www.pmda.go.jp/files/000153265.pdf. Accessed on May 3, 2017.
- 4. Quan H, Zhao PL, Zhang J, Roessner M, Aizawa K. Sample size considerations for Japanese patients in a multi‐regional trial based on MHLW guidance. Pharmaceutical Statistics. 2010;9(2):100–112. pmid:19499510
- 5. Kawai N, Chuang-Stein C, Komiyama O, Li Y. An approach to rationalize partitioning sample size into individual regions in a multiregional trial. Drug Information Journal. 2008;42(2):139–147.
- 6. Tsou HH, James Hung HM, Chen YM, Huang WS, Chang WJ, Hsiao CF. Establishing consistency across all regions in a multi‐regional clinical trial. Pharmaceutical Statistics. 2012; 11(4): 295–299. pmid:22504851
- 7. Uesaka H. Sample size allocation to regions in a multiregional trial. Journal of Biopharmaceutical Statistics. 2009; 19(4): 580–594. pmid:20183427
- 8. Ko FS, Tsou HH, Liu JP, Hsiao CF. Sample size determination for a specific region in a multiregional trial. Journal of Biopharmaceutical Statistics. 2010; 20(4): 870–885. pmid:20496211
- 9. Tsou HH, Chien TY, Liu JP, Hsiao CF. A consistency approach to evaluation of bridging studies and multi‐regional trials. Statistics in Medicine. 2011; 30(17): 2171–2186. pmid:21590701
- 10. Chen CT, Hung HMJ, Hsiao CF. Design and evaluation of multiregional trials with heterogeneous treatment effect across regions. Journal of Biopharmaceutical Statistics. 2012; 22(5): 1037–1050. pmid:22946948
- 11. Huang Y, Chang WJ, Hsiao CF. An empirical Bayes approach to evaluation of results for a specific region in multiregional clinical trials. Pharmaceutical Statistics. 2013; 12(2): 59–64. pmid:23319408
- 12.
Committee for Medicinal Products for Human Use (CHMP). Guideline on medicinal products for the treatment of Alzheimer’s disease and other dementias (CPMP/EWP/553/95 Rev.1). European Medical Agency, London, UK. 2008. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003562.pdf. Accessed on May 3, 2017.
- 13.
Food and Drug Administration. Guidance for Industry. Alzheimer’s disease: developing drugs for the treatment of early stage disease. Center for Drug Evaluation and Research, Food and Drug Administration, Rockville, MD, USA. 2013. http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm338287.pdf. Accessed on May 3, 2017.
- 14. Hung HM, Wang SJ. Some controversial multiple testing problems in regulatory applications. Journal of Biopharmaceutical Statistics. 2009; 19:1–11. pmid:19127460
- 15. Offen W, Chuang-Stein C, Dmitrienko A, Littman G, Maca J, Meyerson L, et al. Multiple co-primary endpoints: medical and statistical solutions. Drug Information Journal. 2007;41:31–46.
- 16.
Sozu T, Sugimoto T, Hamasaki T, Evans SR. Sample size determination in clinical trials with multiple endpoints. Cham: Springer International Publishing; 2015.
- 17. Sozu T, Kanou T, Hamada C, Yoshimura I. Power and sample size calculations in clinical trials with multiple primary variables. Japanese Journal of Biometrics. 2006; 27:83–96.
- 18. Sozu T, Sugimoto T, Hamasaki T. Sample size determination in superiority clinical trials with multiple co-primary correlated endpoints. Journal of Biopharmaceutical Statistics.2011; 21:650–668. pmid:21516562
- 19. Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharmaceutical Statistics. 2007; 161–170. pmid:17674404
- 20.
Westfall PH, Tobias RD, Wolfinger RD. Multiple comparisons and multiple tests using SAS,2nd edition, 2011. Cary, NC: SAS Institute Inc.
- 21. Sugimoto T, Sozu T, Hamasaki T. A convenient formula for sample size calculations in clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012; 11:118–128. pmid:22415870
- 22. Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: exponential event-times. Pharmaceutical Statistics. 2013; 12:28–34. pmid:23081932
- 23. Ikeda K, Bretz F. Sample size and proportion of Japanese patients in multi-regional trials. Pharmaceutical Statistics. 2010; 207–216. pmid:20872621
- 24.
Dmitrienko A, Tamhane AC, Bretz F. Multiple testing problems in pharmaceutical statistics. 2010, Chapman & Hall/CRC, Boca Raton