A Practical Simulation Method to Calculate Sample Size of Group Sequential Trials for Time-to-Event Data under Exponential and Weibull Distribution

Group sequential design has been widely applied in clinical trials in the past few decades. The sample size estimation is a vital concern of sponsors and investigators. Especially in the survival group sequential trials, it is a thorny question because of its ambiguous distributional form, censored data and different definition of information time. A practical and easy-to-use simulation-based method is proposed for multi-stage two-arm survival group sequential design in the article and its SAS program is available. Besides the exponential distribution, which is usually assumed for survival data, the Weibull distribution is considered here. The incorporation of the probability of discontinuation in the simulation leads to the more accurate estimate. The assessment indexes calculated in the simulation are helpful to the determination of number and timing of the interim analysis. The use of the method in the survival group sequential trials is illustrated and the effects of the varied shape parameter on the sample size under the Weibull distribution are explored by employing an example. According to the simulation results, a method to estimate the shape parameter of the Weibull distribution is proposed based on the median survival time of the test drug and the hazard ratio, which are prespecified by the investigators and other participants. 10+ simulations are recommended to achieve the robust estimate of the sample size. Furthermore, the method is still applicable in adaptive design if the strategy of sample size scheme determination is adopted when designing or the minor modifications on the program are made.


Introduction
In the past few decades, group sequential design has been widely used in clinical trials. The sample size calculation is of particular importance when designing a group sequential trial. Adequate sample size guarantees the power to detect the difference and too large sample size is undesired due to the considerations of study time and costs. Various methods, such as Pocock design, O'Brien-Fleming design, the error-spending function, etc, were proposed to design the group sequential trials [1][2][3] and their approaches of sample size estimation for continuous and dichotomous variables were also discussed [4][5]. Compared with normal and binary data, sample size estimation for time-to-event data is more complicated because of its ambiguous distributional form and censored data. However, the time-to-event outcomes are frequently employed in clinical trials. For example, the study endpoints, such as overall survival (OS), progression free survival (PFS) and time to progression (TTP), are usually adopted as primary endpoints in oncology clinical trials. The survival analysis is necessary to analyze the time-to-event data. The group sequential design is often indispensible to oncology trials for ethical considerations. Therefore, it will be helpful to have a practical and simple method to calculate the sample size when group sequential design is applied in clinical trials with time-toevent endpoint.
For survival data, the exponential and Weibull distribution are the two most frequently used parametric models. Of the two distributional forms, the Weibull distribution is more appropriate to describe time-to-event data than the exponential distribution in most cases because it includes the shape parameter besides the scale parameter, which is also contained in the exponential distribution. The shape parameter in the Weibull distribution makes it possible to describe the varied hazard rate over the study interval, which is common in medical studies. Recently, several approaches were proposed to estimate/re-estimate the sample size for group sequential and adaptive design [6][7][8][9][10][11][12][13]. But few of them were considered under the assumption of the Weibull distribution. Only Murphy et al [6], Togo et al [7] and Lu et al [9] discussed the effects of varied shape parameters under the Weibull distribution on the sample size in two-stage clinical trials. Most of these methods were introduced under the assumption of the exponential distribution [8,[10][11][12][13]. The Weibull distribution is seldom employed in the commercial tools, too. For example, PASS is only able to calculate the sample size for survival data under the exponential distribution [14]. Though Heo et al proposed a formula to calculate the sample size for survival data under the Weibull distribution [15], it is just for traditional single-stage trials and cannot be applied in multi-stage group sequential design. Furthermore, the proposed approaches shown in these literatures are usually based on the particular statistical tests or trial procedure. They are sometimes complicated in the actual clinical trials. On the other hand, the Monte Carlo simulation-based approach is an easy and flexible method to estimate the sample size for a trial which has a particular trial design, trial procedure, statistical test and target power [16]. The changes of the trial procedure and other demands on the trial can be conveniently incorporated into the simulation so that the accurate sample size is calculated. The simulation-based method is also generally adopted in some commercial softwares, such as, PASS, nQuery, etc. But the Weibull distribution, which is more suitable to describe the survival data, is not usually considered in these softwares. Therefore, a practical and simple method, which employs Monte Carlo simulation to calculate the sample size for multi-stage twoarm group sequential trials with time-to-event endpoint, is proposed in this paper and its SAS program has been developed. Here, both the exponential and Weibull distribution are considered and the simulation is based on the log-rank test. The probability of discontinuation in the trial is also included and leads to a more accurate estimate. Besides the recommended sample size, a series of assessment indexes will be calculated in the program to determine the optimal plan of group sequential trial. To call the program for sample size determination, a method of estimating the shape parameter based on the hazard ratio and median survival time is suggested from the practical perspective when the Weibull distribution is assumed. Moreover, the generalization of the method for adaptive design is discussed.

Parametric distributions of time-to-event data
Because various physical causes result in the occurrence of a specified event and it is impossible to isolate the causes and account mathematically for all of them, the theoretical distribution of time-to-event outcome is difficult to define in clinical trials [17]. It brings difficulties to sample size determination and statistical analysis of survival data. The common parametric distributions of survival data include exponential, Weibull, gamma, log-normal, log-logistic, normal, exponential power, Gompertz, inverse Gaussian, Pareto, generalized gamma distribution, etc [17]. Of them, the exponential and Weibull distribution are the two most important distributional forms in modeling the survival data and frequently employed to build up the survival parametric models.
The exponential distribution is the simplest and most important parametric distribution of time-to-event data. It is often referred to as a purely random failure pattern. The exponential distribution was firstly proposed to model failure data by Davis [18] and discussed why it had been selected to describe the survival data over the popular normal distribution by Epstein and Sobel [19][20]. Until now, it still plays a major role in modeling the time-toevent data due to its simplicity. The exponential distribution is characterized by a constant hazard rate l, which indicates high risk and short survival. Let t be independent continuous time variable. For the survival data following the exponential distribution, the hazard rate at time t is defined as and the survival rate at time t is Here, l denotes the scale parameter of the exponential distribution. However, the constant hazard rate in the exponential distribution leads to the property of ''lack of memory'', which appears too restrictive when employing it to describe the time-toevent data in clinical trials. Unlike the exponential distribution, the varied hazard rate is considered under the Weibull distribution by including the shape parameter c besides the scale parameter l. The distribution and its applicability were introduced to failure data by Weibull [21][22]. It is thought to be more appropriate to model the time-to-event data than the exponential distribution because of its varied hazard rate. The survival time following the Weibull distribution, has the hazard function and the survival function Here, t is the independent continuous time variable. l and c denote the scale and shape parameter of the Weibull distribution respectively. It is seen that the hazard rate increases when c.1 and decreases when c,1 as t increases. The exponential distribution is the special case of the Weibull distribution when c = 1 and the hazard rate is a constant. Compared with other survival parametric distributions, the exponential and Weibull distribution are of particular importance and generally applied in modeling the time-to-event data in clinical trials. Although the simple form of the exponential distribution makes it easy to fit parametric models for survival outcomes, the Weibull distribution has its superiority, e.g., varied hazard rate, to describe the time-to-event data in medical studies. Therefore, both the exponential and Weibull distribution are employed to estimate the sample size of group sequential design for the time-to-event endpoint in the article.

Group sequential design of time-to-event data
Compared with the traditional trial designs, group sequential design allows early stopping for efficacy/futility based on the results of interim analyses, which effectively improves trial efficiency, shortens trial duration and saves trial costs in some occasions. As a result, it was paid much attention from all trial participants and has been widely applied in clinical trials. When designing a group sequential trial, it is essential and important for biostatisticians to choose the optimal number of trial stages, interim time schedule and stopping boundaries of interim analyses [23]. To preserve the type I error probability in the group sequential design, many methods, e.g. Pocock design, O'Brien-Fleming design, the error spending function, [1][2][3] etc, were introduced to calculate the boundaries of early stopping for efficacy. The Pocock method averages the probability of early stopping in each interim analysis, while the O'Brien-Fleming method takes a conservative attitude at the early interim time points. It inflates the traditional single-stage sample size so little that it becomes one of the most widely used methods in group sequential design [24]. The error spending approach regards the trial as a process of consuming type I error rate. An increasing function a(t), 0ƒtƒ1, which characterizes the rate at which the type I error rate is spent, has to be prespecified in a particular trial and several functions were proposed by Lan et al [3]. Here, t denotes the fraction of the total information no matter whether information time or calendar time is employed in the trial [25]. The method is equivalent to Pocock design when a(t)~a½ln(1z(e{1)t and O'Brien-Fleming design when a(t)~2{2W(z a=2 = ffiffi t p ). Depending on these methods, it is possible to perform extensive simulations to determine the optimal interim monitoring plan, including the number of trial stages and the time schedule of interim analyses, and calculate the sample size.
Although the methods we have mentioned above equally fit the survival trials, there are still some differences between the trials with survival and other types of endpoints. The definition of information time in the survival group sequential trials is different from the trials with normal or binary endpoint. The subjects provide full information to the survival trial only when the events occur. The information time at an interim analysis here is referred to as the proportion of maximum events already observed, but not all subjects who complete the trial [25]. Consequently, the sample size estimation means the calculation of the necessary number of the events for the subsequent stages in the survival trials [26]. Though the number of events in a trial guarantees the test power, the investigators and sponsors are more interested in the subjects to be accrued in a trial for the practical considerations. The transformation from the calculation of the number of events to the number of subjects is essential and vital for the sample size estimation of survival trials. Furthermore, the censoring in survival data makes it more complicated. Therefore, a Monte Carlo simulation-based method is employed and its SAS tool has been developed in this article. In the simulation, all these factors are incorporated to estimate the sample size. Both the number of necessary events to be observed and the number of subjects to be accrued are estimated. Moreover, besides the total sample size, a series of assessment indexes are calculated to help determine the optimal interim monitoring plan.

The simulation-based method for sample size estimation
Here, we consider a two-arm group sequential trial with survival outcome. The hypothesis to be tested is where M T and M C denote the median survival time of the treatment and control groups respectively. Under the assumption of proportional hazard, the hypothesis is equivalent to H 0 : h~1 versus H 1 : h=1, where h is the hazard ratio. Assuming that the trial is divided into k stages, the i-th (i = 1,2,…,k-1) interim analysis is performed after the i-th stage of the trial and the final analysis is implemented when all patients complete the trial. At the i-th interim analysis, the trial stops earlier for efficacy if the derived p i va i , where a i denotes the nominal significance level of the i-th stage. Or else, the trial continues to the subsequent stage. Early stopping for futility is not considered in this article. At the final analysis, the trial is considered to be successful when H 0 is rejected and failed when H 0 is accepted.
To calculate the sample size of survival group sequential trial, the trial parameters, which are related to the trial size, have to be prespecified besides the overall type I error rate a and the overall test power 1{b. First of all, the median survival time M T ,M C and the hazard ratio h directly contribute to the size of the trial. Here, the hazard ratio can be derived by h~(M T =M C ) c under the assumption of proportional hazard, where c is the shape parameter of the Weibull distribution. A higher hazard ratio leads to a smaller sample size and a lower ratio brings a larger one when c is fixed. M T and M C determine the scale parameters of the treatment and control groups respectively under the prespecified survival distribution. Secondly, the maximum observed time of the subject T is related to the sample size. The longer the patients are followed up in a trial, the smaller the sample size is needed. Thirdly, the interim monitoring plan, including the number of trial stages k, the interim time schedule t i and the nominal significance level a i of interim analysis, etc, results in the trial size. Compared with the traditional single-stage design, multiple interim analyses in the group sequential design lead to sample size inflation with the given type I error rate and test power. The more stages are planned in a trial, the larger the sample size is needed. The degree of sample size inflation also depends on the nominal significance levels of interim analyses. As we have mentioned above, the O'Brien-Fleming boundaries result in the smallest sample size among the available approaches. The nominal significance levels can also be determined by performing extensive simulations as long as the overall type I error rate is well controlled. In addition, the survival distributional form is another important trial parameter for sample size estimation. Here, both the exponential and Weibull distribution are considered. It has to be prespecified to calculate the sample size in the simulation. When the Weibull distribution is assumed, the magnitude of shape parameter has to be defined at the same time. The drop-out rate and sample size ratio of the two groups are also necessary to calculate the sample size. In the simulation, it is presumed that the subjects be enrolled in sequence and the subject would enter the study only when the former one had finished the trial. The subject is considered to finish the trial if he/she dies, discontinues or has been followed up for the prespecified longest duration. Due to the presumption, only the subjects, who finish the trial at the interim analysis, are included for data analysis and no one is still at risk in the cohort at that time. Therefore, the accrual information, including the accrual time, accrual rate and accrual distribution, does not have the effects on the sample size. It is unnecessary to define them in the simulation.
In addition to the number of events to be observed and the number of subjects to be accrued in a trial, a series of assessment indexes are calculated to help determine the optimal interim monitoring plan in the simulation. They are listed and explained as follows.
(i) The stage-wise empirical power Power i (i~1,2,:::,k) and cumulative empirical power cPower i at the i-th stage. The stage-wise empirical power at the i-th stage is defined as the proportion of the simulative trials which reject the null hypothesis at the i-th stage when H 1 is true and calculated by Power i~N z i =N, where N denotes the number of all simulative trials and N z i is the number of the ones which succeed at the i-th interim analysis. It reflects the probability of early stopping for efficacy at the i-th stage. The cumulative empirical power is calculated by cPower i~P i m~1 Power m and it equals to the overall empirical power when i = k. It should be noted that Power i is just the stage-wise empirical type I error rate at the i-th stage when M T~Mc is assumed. Because it is referred to as the proportion of the trials which reject the null hypothesis at the i-th stage when H 0 is true. Equally, cPower i becomes cumulative empirical type I error rate under the null hypothesis. Therefore, as long as the various M T and M C are prespecified, both the empirical power and type I error rate are calculated by employing the same program.
(ii) The expected number of events to be observed in a trial. For a kstage group sequential trial with time-to-event outcome, let d i (i~1,2,:::,k) be the number of events to be observed at the i-th stage. The expected number of events to be observed in a trial is calculated by Compared with the total number of events D~P k i~1 d i , E(D) takes account of both the number of events to be observed and the probability of early stopping for efficacy. The cost-effective consideration is included to assess and determine the optimal interim monitoring plan. Depending on the prespecified trial parameters, the Monte Carlo simulation based on the trial procedure is implemented to calculate the sample size, the number of events to be observed and other assessment indexes. In the simulation, the log-rank test is employed as the survival analysis method at the interim analyses and final analysis. All the subjects who finish the trial before the i-th stage are pooled for the i-th interim analysis. Correspondingly, a SAS macro %n_gssur based on the simulation-based method was developed to calculate the sample size and assessment indexes for survival group sequential design. The input macro parameters of the SAS macro are listed in Table 1. The run of the macro depends on the two submacros %exp_gen and %weibull_gen, in which the survival time is generated at random under the exponential and Weibull distribution respectively. The detailed program flow is designed according to the trial procedure and program requirements. It is descried in detail as follows and the flow chart is shown in Figure 1.
Step 1. Prepare for the Monte Carlo simulation. The necessary trial parameters for sample size estimation are obtained from the macro parameters, which are shown in Table 1. The maximum follow up time, the sample size ratio, the dropout rate and the target power are defined according to the comments of clinical investigators and sponsors, which are descried in the trial protocol. For the information of interim monitoring plan, including the number, timing and boundaries of interim analyses, a series of scenarios are advised to try and the optimal plan is chosen by comparing the assessment indexes we have mentioned above. It will be illustrated in detail in the Results by using an example. The distributional form of the time-to-event data is defined according to the statistician's assumption and the shape parameter has to be given when the Weibull distribution is assumed, which is discussed detailedly in the Results and  Step 2. A series of loops are performed to search for the minimum sample size of the control group from the prespecified starting number until the target power or ending number is achieved. Let n T and n C be the sample size of treatment and control groups respectively. The total sample size can be derived by n~n C |(1zr), where r denotes the sample size ratio of n T =n C . Given n T ,n C , the drop-out rate p and the survival rate S T ,S C of the treatment and control groups at time T, the total number of preplanned events in the trial can be calculated by Here, the drop-out rates of the two groups are assumed to be equal. Depending on Eq.(6), the number of events, which should be observed to achieve the target power, is derived when the final sample size is found in the loops. In each loop, the simulative trial with the particular sample size is generated randomly according to the prespecified survival distribution and analyzed based on the trial procedure. N trials are simulated to calculate the empirical power and other assessment indexes. The detailed procedures are described in sequence as follows.
Step 2-1. Generate the random survival time T j under the prespecified survival distribution. Here, the equal shape parameters across the two groups are assumed when the Weibull distribution is considered. When the data are generated, l is given l T or l C based on the treatment assignment of the subject, where l T and l C denote the scale parameter of the treatment and control groups respectively. The subject, who discontinues in the trial, is identified at random under the distribution of Bernoulli (p), where p denotes the drop-out rate of the trial and has to be prespecified in the macro parameter. Here, the subject is assumed to discontinue at random completely and the drop-out rates are equal across the two groups. The censored time of the discontinued subject is not cut down because it is assumed that they would not leave the trial earlier until the event is happening and they should be still from the same survival distribution as the one who finishes the trial.
Step 2-2. Simulate the procedure of group sequential design, perform the interim/final analysis and calculate the assessment indexes, e.g., the overall empirical power, the stage-wise empirical power, the expected number of events, etc. They are completed by the sub-steps as follows.
Step 2-2-1. Screen the subjects for interim analyses. When d i events are observed in the trial, the i-th interim analysis will be performed. The subjects who finish the trial before the i-th stage are screened for data analysis at the i-th interim analysis if no positive result is derived at the (i-1)-th interim analysis. Due to the conservative presumption that the subject would enter the study only when the former one had finished it, no subject is at risk at the interim analysis in the simulation.
Step 2-2-2. Perform the log-rank tests at the interim analyses and final analysis. If the derived p i va i at the i-th interim analysis, the trial stops earlier for efficacy. Otherwise, it proceeds to the subsequent stage. If no positive result is concluded until the final analysis, the null hypothesis is accepted.
Step 2-2-3. Calculate the overall empirical power, Power i , cPower i and the expected number of events to be observed for the particular sample size.
Step 2-2-4. Clear the temporary datasets at the end of the loop.
Step 2-3. Judge whether the target power is achieved and the minimum sample size is found. If the target power is achieved, the loops stop earlier and the final sample size is recommended. Otherwise, the loops continue until the prespecified ending number and the sample size searching fails. However, due to the fluctuation of the calculated empirical power for varied trial size in the simulation, the simulation will enter the stage of verification to check the stability of the calculated power when the target power is achieved for the first time. The trial size is denoted by n Ã and considered as the initial estimate of the sample size of the control group. If the calculated empirical power is still larger than the target power for the trial size of n Ã z1 at the stage of verification, n Ã is considered as the recommended final sample size of the control group. Otherwise, the sample size of the control group will continue to increase by 1 until the target power is achieved for the second time. The trial size at that time is regarded as the recommended final sample size. Moreover, censoring in the time-to-event data is another important issue to sample size estimation. Here, two types of right censored data are considered. One type of censored data consists of the subjects who discontinue in the trial. They are identified at random by employing Bernoulli distribution based on the prespecified drop-out rate. For simplicity, it is assumed that the drop-out subject would not leave the trial earlier until the event is happening and his/her censored time should be still from the same survival distribution as the one who finishes the trial. That is to say the survival time of the discontinued subject is also generated at the Step 2-1 in the simulation. They are only identified as the censored ones and the survival time does not change for discontinuation. Furthermore, the subjects, whose random survival time from the given survival distribution is longer than the maximum observed time, are considered to be censored because the events of these subjects cannot be observed at time T. The survival time of the subject is replaced by T j~T .

Results
A practical clinical trial is employed as an example to illustrate the use of the program and discuss the related issues. The primary objective of a placebo-controlled clinical trial was to assess the efficacy and safety of a new drug as the third-line treatment in the patients with metastatic colorectal cancer. The expected event of the trial was the death of the patient. The patient would be observed for 18 months at maximum until he/she died of the cancer. The primary endpoint of the trial was the OS of the patient. The sample size ratio of treatment and placebo groups was 2:1 for ethical consideration. The group sequential design was considered by the investigators for the interests of the patients and early stopping for efficacy was anticipated. By literature review, the median OS of the placebo group was assumed to be 4.5 months. It was expected by the investigators that the test drug be able to lengthen the median OS by 1.5 months compared with the placebo. The sample size was calculated with the type I error rate of 0.05 and test power of 80%. The drop-out rate was given as 20%. The information time was employed for group sequential design and the O'Brien-Fleming boundaries were implemented to control the overall type I error rate. The SAS macro %n_gssur was employed to determine the optimal interim monitoring plan. The total sample size and the number of deaths to be observed were calculated at the same time. The decision process for the optimal plan was explained under the assumption of the exponential distribution for simplicity. The changes of total sample size and other assessment indexes for varied shape parameters under the Weibull distribution were also compared.
(1) The decision process for the optimal interim monitoring plan To choose the optimal interim monitoring plan, a series of scenarios were assumed to evaluate their differences and the interests of the patients. The scenarios included (A) traditional single stage trial, (B) two-stage trial with equal space time, (C) three-stage trial with equal space time and (D) three-stage trial with t 1~0 :5,t 2~0 :75. For practical consideration, the trial with $4 stages was not considered here. The sample size was estimated under the assumption of the exponential distribution for simplicity. The seed of data generation was specified as '123' and 5000 trials were repeated to calculate the empirical power.
As is shown in Table 2, 591 patients are necessary to be enrolled and 423 deaths have to be observed to guarantee the target power in the two groups if the traditional single stage is considered. It is accidental that the total sample size of the two-stage design is equal to the size of the single stage design because of the fluctuation of the calculated empirical power in the simulation. But the expected number of events in the two-stage design decreases clearly. The phenomenon continues in the three-stage design, which seems better than Scenario A and B due to the more chance of early stopping for efficacy. However, in scenario C, the trial only has the probability of 2.64% of early stopping at the first stage, which is considered to be not practical and cost-effective in an actual trial. On the other hand, the chance of early stopping achieves 17.72% at the first stage and increases to 55.54% until the second stage in Scenario D, which seems more reasonable and attractive. The expected number of events is also smaller than the one of Scenario C. Although the total sample size of Scenario D is a little larger than the ones of the other 3 scenarios, it is still preferable to the sponsors considering that the trial has a larger and reasonable chance to stop earlier at the interim analyses. Therefore, the threestage design with t 1~0 :5,t 2~0 :75 is considered as the optimal monitoring plan. According to the last section, the three-stage design with t 1~0 :5,t 2~0 :75 was adopted in the simulation. The seed of '123' was also employed to generate the simulative data and all the indexes were calculated based on 5000 replications. The simulation results of Scenario A and B are shown in Table 3 and Table 4 respectively.
As is shown in Table 3 and Table 4, the stage-wise empirical power and cumulative empirical power of each stage keep stable for varied shape parameters and their fluctuation are within reasonable range. The changes of shape parameters have no effect on Power i and cPower i as long as the number and timing of interim analyses are fixed. The optimal interim monitoring plan under the exponential distribution also fits the trial no matter how the shape parameter varies. The total sample size, the total number and the expected number of events both decline with the increase of c because the increasing c leads to the rise of hazard ratio when M T and M C are fixed, which is depicted in Figure 2. Especially when c,1, the three assessment indexes decrease dramatically. The total sample size is larger than 1000 when c#0.75 and the hazard ratio approaches to 1. The hazard ratio is so large that the total sample size is smaller than 100 when c$2.5. In Scenario B, as is shown in Figure 3, the total sample size still decrease with the rise of c. But the range of total sample size is smaller than the one of Scenario A. The increasing median survival time of the treatment group, which is attributed to the decreasing c, results in the larger sample size. The total number and expected number of events keep stable no matter how the shape parameter varies because the hazard ratio and target power is the same. When c §2:5, n, D and E(D) keeps the same and the median survival time of the treatment group approaches to the one of the control group. It is seen from the results of Scenario A and B that the total sample size is more sensitive to the hazard ratio than the median survival time of the treatment group. The total number and expected number of events only depend on the hazard ratio when the target power is a constant.

Discussion
The sample size estimation is an important and indispensible part in trial design. It is more complicated for time-to-event data due to its ambiguous distributional form and the censored data. The different definition of information time about the time-toevent endpoint makes it more difficult to estimate the sample size. On the other hand, compared with the available theoretical methods [5][6][7][8][9][10][11][12][13], the simulation-based approach for sample size estimation is easier to implement in an actual trial and can include various practical considerations on the trial, which leads to a more accurate estimate. Therefore, a practical and simple simulationbased method is proposed to calculate the sample size of two-arm survival group sequential trial and its SAS tool is available from the Program S1. In the simulation, the inclusion of the probability of discontinuation contributes to the more accurate estimate. Both the exponential and Weibull distribution are considered to calculate the sample size. Compared with Heo's method [15], which gave a formula to calculate the sample size for single stage trial under the Weibull distribution, our method focuses on the multi-stage group sequential design. Besides the sample size, the assessment indexes, such as, D,E(D),Power i ,cPower i , are calculated in the simulation to help determine the optimal interim monitoring plan for group sequential trial.
To implement the proposed method to calculate the sample size, one has to prespecify the survival distribution, the median survival time of the treatment and control groups, the maximum follow up time, the number of trial stages, the interim time schedule, the dropout rate, the sample size ratio, the target power, etc. Of them, the maximum follow up time, the dropout rate, the sample size ratio and target power are prespecified according to the suggestions of the investigators and sponsors. But the accrual information, including the accrual time, accrual rate and accrual distribution, does not have to be prespecified, which is different form the methods of some commercial softwares. That is because the accrual information depends on too many factors, e.g., the incidence rate of the disease, the number of patients in the site, the inclusion/exclusion criteria of the subject, the working efficiency of the trial monitor, etc. It is usually difficult for clinical investigators to predict the accrual rate and accrual distribution. If too optimistic accrual information is predicted, the estimated sample size cannot guarantee the power to detect the difference and the efficacy of the test drug is ignored. For that reason, it is presumed that the subject would enter the study only when the former one had finished the trial in the simulation from the conservative point. At the interim analysis, only the subject who finishes the trial is included for analysis and no one is still at risk in the cohort. The accrual time, accrual rate and accrual distribution do not affect the sample size and are unnecessary to define. Of course, the presumption results in a conservative and larger estimate of the sample size. But the group sequential design still makes it possible to reject the null hypothesis earlier at the interim analysis if the accrual information is better than the anticipated. In the group sequential trials, the number, timing and boundaries of interim analyses are closely related to the sample size and the number of preplanned events. Both information time and calendar time are available by employing calendar time-information time transformation. The stopping boundaries of interim analyses can be calculated conveniently by using SAS 9.2, GroupSeq Package of R, etc if Pocock method, O'Brien-Fleming method and the error spending method are considered. As long as the overall type I error rate is well controlled, the stopping boundaries can be determined by performing extensive simulations, too. The simulation-based method proposed here makes it possible to choose the optimal number of trial stages and interim monitoring schedule by comparing the calculated assessment indexes. Moreover, the median survival time, hazard ratio and survival distribution directly contribute to the sample size and the number of events. The median survival time and hazard ratio of the two groups reflect the efficacy of the test drug and satisfy HR~(M T =M C ) c when the Weibull distribution is assumed. From the simulation results in Table 3 and 4, it is seen that the number of events, which should be observed to guarantee the target power, only depends on the hazard ratio. The number of subjects to be accrued is more sensitive to the hazard ratio than the median survival time. When HR = 1.333 and c §2:5, the sample size keeps the same no matter how the shape parameter varies. But the test drug cannot length the survival time of the patient significantly compared to the placebo. However, when cv1, the patients may benefit from the prolonged survival time though a larger sample size is needed. On the other hand, when the survival time of the two groups is fixed, the hazard ratio, which varies with the shape parameter of the Weibull distribution, directly results in the sample size. When cv0:7, too low hazard ratio leads to so large sample size and it may be unnecessary to start the trial due to the poor efficacy. When cw2, too small sample size because of the high hazard ratio makes it possible to shorten the follow up time   and accrue more patients. The whole study process of the drug will be quickened and the trial costs may be saved.
In an actual clinical trial, the median survival time is an intuitive endpoint to the clinical investigators and reflects the benefits of the patient from the drug. It is relatively easy for the clinician to estimate the median survival time of the treatment and control groups based on the literature review and the pre-studies on the drug. The hazard ratio is referred to as the hazard rate of the event of the control group to the treatment group. For a clinical trial with the hazard ratio of HR, it is thought that the test drug is able to decrease (1-1/HR) risk of the event compared to the control group. It is also possible for the clinical investigator to estimate the hazard ratio with the help of the biostatistician and other participants according to the pre-studies and the expectations on the test drug. Therefore, when the Weibull distribution, which is more appropriate to describe the time-to-event endpoint for the included shape parameter, is assumed in a trial, it is suggested that the acceptable range of the median survival time of the test drug and the hazard ratio should be estimated at first in order to calculate c. Let M L be the minimum significant median survival time of the test drug from the clinical perspective and M U be the largest expected median survival time. The estimated range of the hazard ratio is ½HR L ,HR U , where HR L and HR U denote the lower and upper limit of the estimated hazard ratio. According to the simulation results, the shape parameter of the Weibull distribution is calculated by from the conservative point, where M C denotes the median survival time of the control group. We still take the metastatic colorectal cancer clinical trials we have mentioned above as an example to illustrate how to estimate c explicitly. It was predicted that the test drug could lengthen the median OS by 2.5 months at best and the median OS of the placebo group was 4.5 months.
The test drug was expected to decrease 25%-30% risk of death compared to the placebo. Accordingly, the range of the hazard ratio was [1.333, 1.429]. The shape parameter of the Weibull distribution was estimated as 0.651 by using Eq. (7). When the three-stage group sequential design with t 1~0 :5,t 2~0 :75 was employed and O'Brien-Fleming boundaries were considered, 696 subjects and 420 deaths were necessary to achieve 80% power to detect the difference. Whether the fluctuation of the sample sizes for various random seeds leads to the inaccurate estimate is another vital concern about the simulation-based approach. To explore the problem, 10 different random seeds were specified respectively to calculate the sample sizes under the assumption of the exponential distribution for the cancer clinical trial we have introduced in the example. The three-stage design with t 1~0 :5,t 2~0 :75 was considered to calculate the sample size. The derived sample sizes range from 588 to 606. The mean and standard deviation of them are 598.2 and 5.13. Correspondingly, the number of deaths ranges from 421 to 434 with the mean of 428.1 and standard deviation of 3.81. It seems that 10 simulations are enough to provide a robust estimate. Therefore, in order to get the robust estimate of the sample size, 10+ simulations with various seeds are suggested and their mean is recommended as the final sample size. Moreover, we find that the sample sizes calculated under the exponential distribution and the Weibull distribution with the shape parameter of 1 are not equal exactly even if the same seed was specified, which is seen in Table 2 and 3. It is because different SAS functions were called to generate the streams of random number, which results in the differences of derived sample sizes. But the mean sample size under the Weibull distribution with c of 1, which was estimated from 10 simulations with the same seeds as the simulations of the exponential distribution, is 601.8. It is very close to the mean sample size under the exponential distribution.
However, adaptive design, which is considered to be an extension of group sequential design, has been increasingly applied in the recent years. It allows sample size re-estimation after interim analysis to cope with the inaccurate estimate on the efficacy of the drug when designing the trial. The simulationbased approach introduced in this article, which was initiated   from the group sequential design, can also be applied in adaptive design by employing the following strategy. In the planning phase of the trial, a series of possible scenarios about the efficacy of the test drug are assumed in advance and the sample size scheme for each possible scenario is calculated by employing %n_gssur. After the interim analysis, the investigator is able to choose the suitable sample size from the scheme for the subsequent stage according to the results of interim analysis. In the strategy, no modification of the program is necessary. The strategy protects the integrity of adaptive design to a greater extent than the methods of sample size re-calculation after interim analysis. But the error-spending boundaries may be not fit in the strategy and they have to be determined by extensive simulations. The detailed procedures for stopping boundary determination and sample size scheme calculation can be seen in [27]. On the other hand, the minor modifications of the program are inevitable if the sample size recalculation after interim analysis is considered. To run the macro for sample size re-calculation, the interim data is necessary and the statistical test procedure has to be modified. Here, the logrank test is still suggested for interim/final analysis and the derived p-values are pooled by employing the inverse normal method [26,28]. Depending on these modifications on the SAS program, it can be further applied to re-estimate the sample size in adaptive design.