A Practical Simulation Method to Calculate Sample Size of Group Sequential Trials for Time-to-Event Data under Exponential and Weibull Distribution

Zhiwei Jiang; Ling Wang; Chanjuan Li; Jielai Xia; Hongxia Jia

doi:10.1371/journal.pone.0044013

Abstract

Group sequential design has been widely applied in clinical trials in the past few decades. The sample size estimation is a vital concern of sponsors and investigators. Especially in the survival group sequential trials, it is a thorny question because of its ambiguous distributional form, censored data and different definition of information time. A practical and easy-to-use simulation-based method is proposed for multi-stage two-arm survival group sequential design in the article and its SAS program is available. Besides the exponential distribution, which is usually assumed for survival data, the Weibull distribution is considered here. The incorporation of the probability of discontinuation in the simulation leads to the more accurate estimate. The assessment indexes calculated in the simulation are helpful to the determination of number and timing of the interim analysis. The use of the method in the survival group sequential trials is illustrated and the effects of the varied shape parameter on the sample size under the Weibull distribution are explored by employing an example. According to the simulation results, a method to estimate the shape parameter of the Weibull distribution is proposed based on the median survival time of the test drug and the hazard ratio, which are prespecified by the investigators and other participants. 10+ simulations are recommended to achieve the robust estimate of the sample size. Furthermore, the method is still applicable in adaptive design if the strategy of sample size scheme determination is adopted when designing or the minor modifications on the program are made.

Citation: Jiang Z, Wang L, Li C, Xia J, Jia H (2012) A Practical Simulation Method to Calculate Sample Size of Group Sequential Trials for Time-to-Event Data under Exponential and Weibull Distribution. PLoS ONE 7(9): e44013. https://doi.org/10.1371/journal.pone.0044013

Editor: Shyamal D. Peddada, National Institute of Environmental and Health Sciences, United States of America

Received: June 26, 2012; Accepted: July 27, 2012; Published: September 5, 2012

Copyright: © Jiang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The study was partially supported by the research grants (81072390, 81001290) from National Science Foundation of China and program for Changjiang Scholars and Innovative Research Team in University (PCISIRT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the past few decades, group sequential design has been widely used in clinical trials. The sample size calculation is of particular importance when designing a group sequential trial. Adequate sample size guarantees the power to detect the difference and too large sample size is undesired due to the considerations of study time and costs. Various methods, such as Pocock design, O'Brien-Fleming design, the error-spending function, etc, were proposed to design the group sequential trials [1]–[3] and their approaches of sample size estimation for continuous and dichotomous variables were also discussed [4]–[5]. Compared with normal and binary data, sample size estimation for time-to-event data is more complicated because of its ambiguous distributional form and censored data. However, the time-to-event outcomes are frequently employed in clinical trials. For example, the study endpoints, such as overall survival (OS), progression free survival (PFS) and time to progression (TTP), are usually adopted as primary endpoints in oncology clinical trials. The survival analysis is necessary to analyze the time-to-event data. The group sequential design is often indispensible to oncology trials for ethical considerations. Therefore, it will be helpful to have a practical and simple method to calculate the sample size when group sequential design is applied in clinical trials with time-to-event endpoint.

For survival data, the exponential and Weibull distribution are the two most frequently used parametric models. Of the two distributional forms, the Weibull distribution is more appropriate to describe time-to-event data than the exponential distribution in most cases because it includes the shape parameter besides the scale parameter, which is also contained in the exponential distribution. The shape parameter in the Weibull distribution makes it possible to describe the varied hazard rate over the study interval, which is common in medical studies. Recently, several approaches were proposed to estimate/re-estimate the sample size for group sequential and adaptive design [6]–[13]. But few of them were considered under the assumption of the Weibull distribution. Only Murphy et al [6], Togo et al [7] and Lu et al [9] discussed the effects of varied shape parameters under the Weibull distribution on the sample size in two-stage clinical trials. Most of these methods were introduced under the assumption of the exponential distribution [8], [10]–[13]. The Weibull distribution is seldom employed in the commercial tools, too. For example, PASS is only able to calculate the sample size for survival data under the exponential distribution [14]. Though Heo et al proposed a formula to calculate the sample size for survival data under the Weibull distribution [15], it is just for traditional single-stage trials and cannot be applied in multi-stage group sequential design. Furthermore, the proposed approaches shown in these literatures are usually based on the particular statistical tests or trial procedure. They are sometimes complicated in the actual clinical trials. On the other hand, the Monte Carlo simulation-based approach is an easy and flexible method to estimate the sample size for a trial which has a particular trial design, trial procedure, statistical test and target power [16]. The changes of the trial procedure and other demands on the trial can be conveniently incorporated into the simulation so that the accurate sample size is calculated. The simulation-based method is also generally adopted in some commercial softwares, such as, PASS, nQuery, etc. But the Weibull distribution, which is more suitable to describe the survival data, is not usually considered in these softwares. Therefore, a practical and simple method, which employs Monte Carlo simulation to calculate the sample size for multi-stage two-arm group sequential trials with time-to-event endpoint, is proposed in this paper and its SAS program has been developed. Here, both the exponential and Weibull distribution are considered and the simulation is based on the log-rank test. The probability of discontinuation in the trial is also included and leads to a more accurate estimate. Besides the recommended sample size, a series of assessment indexes will be calculated in the program to determine the optimal plan of group sequential trial. To call the program for sample size determination, a method of estimating the shape parameter based on the hazard ratio and median survival time is suggested from the practical perspective when the Weibull distribution is assumed. Moreover, the generalization of the method for adaptive design is discussed.

Methods

Parametric distributions of time-to-event data

Because various physical causes result in the occurrence of a specified event and it is impossible to isolate the causes and account mathematically for all of them, the theoretical distribution of time-to-event outcome is difficult to define in clinical trials [17]. It brings difficulties to sample size determination and statistical analysis of survival data. The common parametric distributions of survival data include exponential, Weibull, gamma, log-normal, log-logistic, normal, exponential power, Gompertz, inverse Gaussian, Pareto, generalized gamma distribution, etc [17]. Of them, the exponential and Weibull distribution are the two most important distributional forms in modeling the survival data and frequently employed to build up the survival parametric models.

The exponential distribution is the simplest and most important parametric distribution of time-to-event data. It is often referred to as a purely random failure pattern. The exponential distribution was firstly proposed to model failure data by Davis [18] and discussed why it had been selected to describe the survival data over the popular normal distribution by Epstein and Sobel [19]–[20]. Until now, it still plays a major role in modeling the time-to-event data due to its simplicity. The exponential distribution is characterized by a constant hazard rate , which indicates high risk and short survival. Let t be independent continuous time variable. For the survival data following the exponential distribution, the hazard rate at time t is defined as(1)and the survival rate at time t is(2)Here, denotes the scale parameter of the exponential distribution. However, the constant hazard rate in the exponential distribution leads to the property of “lack of memory”, which appears too restrictive when employing it to describe the time-to-event data in clinical trials.

Unlike the exponential distribution, the varied hazard rate is considered under the Weibull distribution by including the shape parameter besides the scale parameter . The distribution and its applicability were introduced to failure data by Weibull [21]–[22]. It is thought to be more appropriate to model the time-to-event data than the exponential distribution because of its varied hazard rate. The survival time following the Weibull distribution, has the hazard function(3)and the survival function(4)Here, t is the independent continuous time variable. and denote the scale and shape parameter of the Weibull distribution respectively. It is seen that the hazard rate increases when >1 and decreases when <1 as t increases. The exponential distribution is the special case of the Weibull distribution when = 1 and the hazard rate is a constant.

Compared with other survival parametric distributions, the exponential and Weibull distribution are of particular importance and generally applied in modeling the time-to-event data in clinical trials. Although the simple form of the exponential distribution makes it easy to fit parametric models for survival outcomes, the Weibull distribution has its superiority, e.g., varied hazard rate, to describe the time-to-event data in medical studies. Therefore, both the exponential and Weibull distribution are employed to estimate the sample size of group sequential design for the time-to-event endpoint in the article.

Group sequential design of time-to-event data

Compared with the traditional trial designs, group sequential design allows early stopping for efficacy/futility based on the results of interim analyses, which effectively improves trial efficiency, shortens trial duration and saves trial costs in some occasions. As a result, it was paid much attention from all trial participants and has been widely applied in clinical trials. When designing a group sequential trial, it is essential and important for biostatisticians to choose the optimal number of trial stages, interim time schedule and stopping boundaries of interim analyses [23]. To preserve the type I error probability in the group sequential design, many methods, e.g. Pocock design, O'Brien-Fleming design, the error spending function, [1]–[3] etc, were introduced to calculate the boundaries of early stopping for efficacy. The Pocock method averages the probability of early stopping in each interim analysis, while the O'Brien-Fleming method takes a conservative attitude at the early interim time points. It inflates the traditional single-stage sample size so little that it becomes one of the most widely used methods in group sequential design [24]. The error spending approach regards the trial as a process of consuming type I error rate. An increasing function , which characterizes the rate at which the type I error rate is spent, has to be prespecified in a particular trial and several functions were proposed by Lan et al [3]. Here, t denotes the fraction of the total information no matter whether information time or calendar time is employed in the trial [25]. The method is equivalent to Pocock design when and O'Brien-Fleming design when . Depending on these methods, it is possible to perform extensive simulations to determine the optimal interim monitoring plan, including the number of trial stages and the time schedule of interim analyses, and calculate the sample size.

Although the methods we have mentioned above equally fit the survival trials, there are still some differences between the trials with survival and other types of endpoints. The definition of information time in the survival group sequential trials is different from the trials with normal or binary endpoint. The subjects provide full information to the survival trial only when the events occur. The information time at an interim analysis here is referred to as the proportion of maximum events already observed, but not all subjects who complete the trial [25]. Consequently, the sample size estimation means the calculation of the necessary number of the events for the subsequent stages in the survival trials [26]. Though the number of events in a trial guarantees the test power, the investigators and sponsors are more interested in the subjects to be accrued in a trial for the practical considerations. The transformation from the calculation of the number of events to the number of subjects is essential and vital for the sample size estimation of survival trials. Furthermore, the censoring in survival data makes it more complicated. Therefore, a Monte Carlo simulation-based method is employed and its SAS tool has been developed in this article. In the simulation, all these factors are incorporated to estimate the sample size. Both the number of necessary events to be observed and the number of subjects to be accrued are estimated. Moreover, besides the total sample size, a series of assessment indexes are calculated to help determine the optimal interim monitoring plan.

The simulation-based method for sample size estimation

Here, we consider a two-arm group sequential trial with survival outcome. The hypothesis to be tested iswhere and denote the median survival time of the treatment and control groups respectively. Under the assumption of proportional hazard, the hypothesis is equivalent to , where is the hazard ratio. Assuming that the trial is divided into k stages, the i-th (i = 1,2,…,k-1) interim analysis is performed after the i-th stage of the trial and the final analysis is implemented when all patients complete the trial. At the i-th interim analysis, the trial stops earlier for efficacy if the derived , where denotes the nominal significance level of the i-th stage. Or else, the trial continues to the subsequent stage. Early stopping for futility is not considered in this article. At the final analysis, the trial is considered to be successful when is rejected and failed when is accepted.

To calculate the sample size of survival group sequential trial, the trial parameters, which are related to the trial size, have to be prespecified besides the overall type I error rate and the overall test power . First of all, the median survival time and the hazard ratio directly contribute to the size of the trial. Here, the hazard ratio can be derived by under the assumption of proportional hazard, where is the shape parameter of the Weibull distribution. A higher hazard ratio leads to a smaller sample size and a lower ratio brings a larger one when is fixed. and determine the scale parameters of the treatment and control groups respectively under the prespecified survival distribution. Secondly, the maximum observed time of the subject T is related to the sample size. The longer the patients are followed up in a trial, the smaller the sample size is needed. Thirdly, the interim monitoring plan, including the number of trial stages k, the interim time schedule and the nominal significance level of interim analysis, etc, results in the trial size. Compared with the traditional single-stage design, multiple interim analyses in the group sequential design lead to sample size inflation with the given type I error rate and test power. The more stages are planned in a trial, the larger the sample size is needed. The degree of sample size inflation also depends on the nominal significance levels of interim analyses. As we have mentioned above, the O'Brien-Fleming boundaries result in the smallest sample size among the available approaches. The nominal significance levels can also be determined by performing extensive simulations as long as the overall type I error rate is well controlled. In addition, the survival distributional form is another important trial parameter for sample size estimation. Here, both the exponential and Weibull distribution are considered. It has to be prespecified to calculate the sample size in the simulation. When the Weibull distribution is assumed, the magnitude of shape parameter has to be defined at the same time. The drop-out rate and sample size ratio of the two groups are also necessary to calculate the sample size. In the simulation, it is presumed that the subjects be enrolled in sequence and the subject would enter the study only when the former one had finished the trial. The subject is considered to finish the trial if he/she dies, discontinues or has been followed up for the prespecified longest duration. Due to the presumption, only the subjects, who finish the trial at the interim analysis, are included for data analysis and no one is still at risk in the cohort at that time. Therefore, the accrual information, including the accrual time, accrual rate and accrual distribution, does not have the effects on the sample size. It is unnecessary to define them in the simulation.

In addition to the number of events to be observed and the number of subjects to be accrued in a trial, a series of assessment indexes are calculated to help determine the optimal interim monitoring plan in the simulation. They are listed and explained as follows.

(i) The stage-wise empirical power and cumulative empirical power at the i-th stage. The stage-wise empirical power at the i-th stage is defined as the proportion of the simulative trials which reject the null hypothesis at the i-th stage when is true and calculated by , where denotes the number of all simulative trials and is the number of the ones which succeed at the i-th interim analysis. It reflects the probability of early stopping for efficacy at the i-th stage. The cumulative empirical power is calculated by and it equals to the overall empirical power when i = k. It should be noted that is just the stage-wise empirical type I error rate at the i-th stage when is assumed. Because it is referred to as the proportion of the trials which reject the null hypothesis at the i-th stage when is true. Equally, becomes cumulative empirical type I error rate under the null hypothesis. Therefore, as long as the various and are prespecified, both the empirical power and type I error rate are calculated by employing the same program.

(ii) The expected number of events to be observed in a trial. For a k-stage group sequential trial with time-to-event outcome, let be the number of events to be observed at the i-th stage. The expected number of events to be observed in a trial is calculated by(5)Compared with the total number of events , takes account of both the number of events to be observed and the probability of early stopping for efficacy. The cost-effective consideration is included to assess and determine the optimal interim monitoring plan.

Depending on the prespecified trial parameters, the Monte Carlo simulation based on the trial procedure is implemented to calculate the sample size, the number of events to be observed and other assessment indexes. In the simulation, the log-rank test is employed as the survival analysis method at the interim analyses and final analysis. All the subjects who finish the trial before the i-th stage are pooled for the i-th interim analysis. Correspondingly, a SAS macro %n_gssur based on the simulation-based method was developed to calculate the sample size and assessment indexes for survival group sequential design. The input macro parameters of the SAS macro are listed in Table 1. The run of the macro depends on the two sub-macros %exp_gen and %weibull_gen, in which the survival time is generated at random under the exponential and Weibull distribution respectively. The detailed program flow is designed according to the trial procedure and program requirements. It is descried in detail as follows and the flow chart is shown in Figure 1.

Download:

Figure 1. Program flow of the SAS macro.

https://doi.org/10.1371/journal.pone.0044013.g001

Download:

Table 1. The input macro parameters in the SAS macro %n_gssur.

https://doi.org/10.1371/journal.pone.0044013.t001

Step 1.

Prepare for the Monte Carlo simulation. The necessary trial parameters for sample size estimation are obtained from the macro parameters, which are shown in Table 1. The maximum follow up time, the sample size ratio, the dropout rate and the target power are defined according to the comments of clinical investigators and sponsors, which are descried in the trial protocol. For the information of interim monitoring plan, including the number, timing and boundaries of interim analyses, a series of scenarios are advised to try and the optimal plan is chosen by comparing the assessment indexes we have mentioned above. It will be illustrated in detail in the Results by using an example. The distributional form of the time-to-event data is defined according to the statistician's assumption and the shape parameter has to be given when the Weibull distribution is assumed, which is discussed detailedly in the Results and Discussion. The median survival time of the treatment and control group are prespecified based on the literature review and the expectation on the efficacy of the test drug. The scale parameter of the survival distribution and the survival rate of treatment and control groups at time T are calculated from the median survival time by employing Eq.(2) or Eq.(4) according to the assumed survival distribution. The other macro parameters are designed for the requirements of the program. Generally speaking, at least 1000 replications in the simulation are necessary for the robust estimate. The starting and ending number of the sample size searching can be defined according to the sample size of the traditional single stage design and the multiple attempts are indispensible.

Step 2.

A series of loops are performed to search for the minimum sample size of the control group from the prespecified starting number until the target power or ending number is achieved. Let and be the sample size of treatment and control groups respectively. The total sample size can be derived by , where r denotes the sample size ratio of . Given , the drop-out rate and the survival rate of the treatment and control groups at time T, the total number of preplanned events in the trial can be calculated by(6)Here, the drop-out rates of the two groups are assumed to be equal. Depending on Eq.(6), the number of events, which should be observed to achieve the target power, is derived when the final sample size is found in the loops. In each loop, the simulative trial with the particular sample size is generated randomly according to the prespecified survival distribution and analyzed based on the trial procedure. N trials are simulated to calculate the empirical power and other assessment indexes. The detailed procedures are described in sequence as follows.

Step 2-1.

Generate the random survival time under the prespecified survival distribution. Here, the equal shape parameters across the two groups are assumed when the Weibull distribution is considered. When the data are generated, is given or based on the treatment assignment of the subject, where and denote the scale parameter of the treatment and control groups respectively. The subject, who discontinues in the trial, is identified at random under the distribution of Bernoulli (π), where π denotes the drop-out rate of the trial and has to be prespecified in the macro parameter. Here, the subject is assumed to discontinue at random completely and the drop-out rates are equal across the two groups. The censored time of the discontinued subject is not cut down because it is assumed that they would not leave the trial earlier until the event is happening and they should be still from the same survival distribution as the one who finishes the trial.

Step 2-2.

Simulate the procedure of group sequential design, perform the interim/final analysis and calculate the assessment indexes, e.g., the overall empirical power, the stage-wise empirical power, the expected number of events, etc. They are completed by the sub-steps as follows.

Step 2-2-1.

Screen the subjects for interim analyses. When events are observed in the trial, the i-th interim analysis will be performed. The subjects who finish the trial before the i-th stage are screened for data analysis at the i-th interim analysis if no positive result is derived at the (i-1)-th interim analysis. Due to the conservative presumption that the subject would enter the study only when the former one had finished it, no subject is at risk at the interim analysis in the simulation.

Step 2-2-2.

Perform the log-rank tests at the interim analyses and final analysis. If the derived at the i-th interim analysis, the trial stops earlier for efficacy. Otherwise, it proceeds to the subsequent stage. If no positive result is concluded until the final analysis, the null hypothesis is accepted.

Step 2-2-3.

Calculate the overall empirical power, , and the expected number of events to be observed for the particular sample size.

Step 2-2-4.

Clear the temporary datasets at the end of the loop.

Step 2-3.

Judge whether the target power is achieved and the minimum sample size is found. If the target power is achieved, the loops stop earlier and the final sample size is recommended. Otherwise, the loops continue until the prespecified ending number and the sample size searching fails. However, due to the fluctuation of the calculated empirical power for varied trial size in the simulation, the simulation will enter the stage of verification to check the stability of the calculated power when the target power is achieved for the first time. The trial size is denoted by and considered as the initial estimate of the sample size of the control group. If the calculated empirical power is still larger than the target power for the trial size of at the stage of verification, is considered as the recommended final sample size of the control group. Otherwise, the sample size of the control group will continue to increase by 1 until the target power is achieved for the second time. The trial size at that time is regarded as the recommended final sample size.

Moreover, censoring in the time-to-event data is another important issue to sample size estimation. Here, two types of right censored data are considered. One type of censored data consists of the subjects who discontinue in the trial. They are identified at random by employing Bernoulli distribution based on the prespecified drop-out rate. For simplicity, it is assumed that the drop-out subject would not leave the trial earlier until the event is happening and his/her censored time should be still from the same survival distribution as the one who finishes the trial. That is to say the survival time of the discontinued subject is also generated at the Step 2-1 in the simulation. They are only identified as the censored ones and the survival time does not change for discontinuation. Furthermore, the subjects, whose random survival time from the given survival distribution is longer than the maximum observed time, are considered to be censored because the events of these subjects cannot be observed at time T. The survival time of the subject is replaced by .

Results

A practical clinical trial is employed as an example to illustrate the use of the program and discuss the related issues. The primary objective of a placebo-controlled clinical trial was to assess the efficacy and safety of a new drug as the third-line treatment in the patients with metastatic colorectal cancer. The expected event of the trial was the death of the patient. The patient would be observed for 18 months at maximum until he/she died of the cancer. The primary endpoint of the trial was the OS of the patient. The sample size ratio of treatment and placebo groups was 2∶1 for ethical consideration. The group sequential design was considered by the investigators for the interests of the patients and early stopping for efficacy was anticipated. By literature review, the median OS of the placebo group was assumed to be 4.5 months. It was expected by the investigators that the test drug be able to lengthen the median OS by 1.5 months compared with the placebo. The sample size was calculated with the type I error rate of 0.05 and test power of 80%. The drop-out rate was given as 20%. The information time was employed for group sequential design and the O'Brien-Fleming boundaries were implemented to control the overall type I error rate. The SAS macro %n_gssur was employed to determine the optimal interim monitoring plan. The total sample size and the number of deaths to be observed were calculated at the same time. The decision process for the optimal plan was explained under the assumption of the exponential distribution for simplicity. The changes of total sample size and other assessment indexes for varied shape parameters under the Weibull distribution were also compared.

(1) The decision process for the optimal interim monitoring plan

To choose the optimal interim monitoring plan, a series of scenarios were assumed to evaluate their differences and the interests of the patients. The scenarios included (A) traditional single stage trial, (B) two-stage trial with equal space time, (C) three-stage trial with equal space time and (D) three-stage trial with . For practical consideration, the trial with ≥4 stages was not considered here. The sample size was estimated under the assumption of the exponential distribution for simplicity. The seed of data generation was specified as ‘123’ and 5000 trials were repeated to calculate the empirical power.

As is shown in Table 2, 591 patients are necessary to be enrolled and 423 deaths have to be observed to guarantee the target power in the two groups if the traditional single stage is considered. It is accidental that the total sample size of the two-stage design is equal to the size of the single stage design because of the fluctuation of the calculated empirical power in the simulation. But the expected number of events in the two-stage design decreases clearly. The phenomenon continues in the three-stage design, which seems better than Scenario A and B due to the more chance of early stopping for efficacy. However, in scenario C, the trial only has the probability of 2.64% of early stopping at the first stage, which is considered to be not practical and cost-effective in an actual trial. On the other hand, the chance of early stopping achieves 17.72% at the first stage and increases to 55.54% until the second stage in Scenario D, which seems more reasonable and attractive. The expected number of events is also smaller than the one of Scenario C. Although the total sample size of Scenario D is a little larger than the ones of the other 3 scenarios, it is still preferable to the sponsors considering that the trial has a larger and reasonable chance to stop earlier at the interim analyses. Therefore, the three-stage design with is considered as the optimal monitoring plan.

Download:

Table 2. The comparison of the results for different interim monitoring plans under the exponential distribution.

https://doi.org/10.1371/journal.pone.0044013.t002

(2) The comparison of the results for varied γ under the Weibull distribution

On account of the relationship among and the hazard ratio, the effects of varied shape parameters on the sample size, the number of events and other assessment indexes were explored under two scenarios: (A) varied hazard ratio due to the various γ with ; (B) varied due to the various γ with . According to the last section, the three-stage design with was adopted in the simulation. The seed of ‘123’ was also employed to generate the simulative data and all the indexes were calculated based on 5000 replications. The simulation results of Scenario A and B are shown in Table 3 and Table 4 respectively.

Download:

Table 3. The comparison of the results for varied γ with fixed M_T and M_C under the Weibull distribution (M_T = 6, M_C = 4.5).

https://doi.org/10.1371/journal.pone.0044013.t003

Download:

Table 4. The comparison of results for varied γ with fixed hazard ratio under the Weibull distribution (HR = 1.333).

https://doi.org/10.1371/journal.pone.0044013.t004

As is shown in Table 3 and Table 4, the stage-wise empirical power and cumulative empirical power of each stage keep stable for varied shape parameters and their fluctuation are within reasonable range. The changes of shape parameters have no effect on and as long as the number and timing of interim analyses are fixed. The optimal interim monitoring plan under the exponential distribution also fits the trial no matter how the shape parameter varies. The total sample size, the total number and the expected number of events both decline with the increase of γ because the increasing γ leads to the rise of hazard ratio when and are fixed, which is depicted in Figure 2. Especially when γ<1, the three assessment indexes decrease dramatically. The total sample size is larger than 1000 when γ≤0.75 and the hazard ratio approaches to 1. The hazard ratio is so large that the total sample size is smaller than 100 when γ≥2.5. In Scenario B, as is shown in Figure 3, the total sample size still decrease with the rise of γ. But the range of total sample size is smaller than the one of Scenario A. The increasing median survival time of the treatment group, which is attributed to the decreasing γ, results in the larger sample size. The total number and expected number of events keep stable no matter how the shape parameter varies because the hazard ratio and target power is the same. When , n, D and E(D) keeps the same and the median survival time of the treatment group approaches to the one of the control group. It is seen from the results of Scenario A and B that the total sample size is more sensitive to the hazard ratio than the median survival time of the treatment group. The total number and expected number of events only depend on the hazard ratio when the target power is a constant.

Download:

Figure 2. The change of n, D and E (D) for varied γ with fixed M_T and M_C under the Weibull distribution (M_T = 6, M_C = 4.5).

https://doi.org/10.1371/journal.pone.0044013.g002

Download:

Figure 3. The change of n, D and E(D) for varied γ with fixed hazard ratio under the Weibull distribution (HR = 1.333).

https://doi.org/10.1371/journal.pone.0044013.g003

Discussion

The sample size estimation is an important and indispensible part in trial design. It is more complicated for time-to-event data due to its ambiguous distributional form and the censored data. The different definition of information time about the time-to-event endpoint makes it more difficult to estimate the sample size. On the other hand, compared with the available theoretical methods [5]–[13], the simulation-based approach for sample size estimation is easier to implement in an actual trial and can include various practical considerations on the trial, which leads to a more accurate estimate. Therefore, a practical and simple simulation-based method is proposed to calculate the sample size of two-arm survival group sequential trial and its SAS tool is available from the Program S1. In the simulation, the inclusion of the probability of discontinuation contributes to the more accurate estimate. Both the exponential and Weibull distribution are considered to calculate the sample size. Compared with Heo's method [15], which gave a formula to calculate the sample size for single stage trial under the Weibull distribution, our method focuses on the multi-stage group sequential design. Besides the sample size, the assessment indexes, such as, , are calculated in the simulation to help determine the optimal interim monitoring plan for group sequential trial.

To implement the proposed method to calculate the sample size, one has to prespecify the survival distribution, the median survival time of the treatment and control groups, the maximum follow up time, the number of trial stages, the interim time schedule, the dropout rate, the sample size ratio, the target power, etc. Of them, the maximum follow up time, the dropout rate, the sample size ratio and target power are prespecified according to the suggestions of the investigators and sponsors. But the accrual information, including the accrual time, accrual rate and accrual distribution, does not have to be prespecified, which is different form the methods of some commercial softwares. That is because the accrual information depends on too many factors, e.g., the incidence rate of the disease, the number of patients in the site, the inclusion/exclusion criteria of the subject, the working efficiency of the trial monitor, etc. It is usually difficult for clinical investigators to predict the accrual rate and accrual distribution. If too optimistic accrual information is predicted, the estimated sample size cannot guarantee the power to detect the difference and the efficacy of the test drug is ignored. For that reason, it is presumed that the subject would enter the study only when the former one had finished the trial in the simulation from the conservative point. At the interim analysis, only the subject who finishes the trial is included for analysis and no one is still at risk in the cohort. The accrual time, accrual rate and accrual distribution do not affect the sample size and are unnecessary to define. Of course, the presumption results in a conservative and larger estimate of the sample size. But the group sequential design still makes it possible to reject the null hypothesis earlier at the interim analysis if the accrual information is better than the anticipated. In the group sequential trials, the number, timing and boundaries of interim analyses are closely related to the sample size and the number of preplanned events. Both information time and calendar time are available by employing calendar time-information time transformation. The stopping boundaries of interim analyses can be calculated conveniently by using SAS 9.2, GroupSeq Package of R, etc if Pocock method, O'Brien-Fleming method and the error spending method are considered. As long as the overall type I error rate is well controlled, the stopping boundaries can be determined by performing extensive simulations, too. The simulation-based method proposed here makes it possible to choose the optimal number of trial stages and interim monitoring schedule by comparing the calculated assessment indexes. Moreover, the median survival time, hazard ratio and survival distribution directly contribute to the sample size and the number of events. The median survival time and hazard ratio of the two groups reflect the efficacy of the test drug and satisfy when the Weibull distribution is assumed. From the simulation results in Table 3 and 4, it is seen that the number of events, which should be observed to guarantee the target power, only depends on the hazard ratio. The number of subjects to be accrued is more sensitive to the hazard ratio than the median survival time. When HR = 1.333 and , the sample size keeps the same no matter how the shape parameter varies. But the test drug cannot length the survival time of the patient significantly compared to the placebo. However, when , the patients may benefit from the prolonged survival time though a larger sample size is needed. On the other hand, when the survival time of the two groups is fixed, the hazard ratio, which varies with the shape parameter of the Weibull distribution, directly results in the sample size. When , too low hazard ratio leads to so large sample size and it may be unnecessary to start the trial due to the poor efficacy. When , too small sample size because of the high hazard ratio makes it possible to shorten the follow up time and accrue more patients. The whole study process of the drug will be quickened and the trial costs may be saved.

In an actual clinical trial, the median survival time is an intuitive endpoint to the clinical investigators and reflects the benefits of the patient from the drug. It is relatively easy for the clinician to estimate the median survival time of the treatment and control groups based on the literature review and the pre-studies on the drug. The hazard ratio is referred to as the hazard rate of the event of the control group to the treatment group. For a clinical trial with the hazard ratio of HR, it is thought that the test drug is able to decrease (1-1/HR) risk of the event compared to the control group. It is also possible for the clinical investigator to estimate the hazard ratio with the help of the biostatistician and other participants according to the pre-studies and the expectations on the test drug. Therefore, when the Weibull distribution, which is more appropriate to describe the time-to-event endpoint for the included shape parameter, is assumed in a trial, it is suggested that the acceptable range of the median survival time of the test drug and the hazard ratio should be estimated at first in order to calculate . Let be the minimum significant median survival time of the test drug from the clinical perspective and be the largest expected median survival time. The estimated range of the hazard ratio is , where and denote the lower and upper limit of the estimated hazard ratio. According to the simulation results, the shape parameter of the Weibull distribution is calculated by(7)from the conservative point, where denotes the median survival time of the control group. We still take the metastatic colorectal cancer clinical trials we have mentioned above as an example to illustrate how to estimate explicitly. It was predicted that the test drug could lengthen the median OS by 2.5 months at best and the median OS of the placebo group was 4.5 months. The test drug was expected to decrease 25%-30% risk of death compared to the placebo. Accordingly, the range of the hazard ratio was [1.333, 1.429]. The shape parameter of the Weibull distribution was estimated as 0.651 by using Eq.(7). When the three-stage group sequential design with was employed and O'Brien-Fleming boundaries were considered, 696 subjects and 420 deaths were necessary to achieve 80% power to detect the difference.

Whether the fluctuation of the sample sizes for various random seeds leads to the inaccurate estimate is another vital concern about the simulation-based approach. To explore the problem, 10 different random seeds were specified respectively to calculate the sample sizes under the assumption of the exponential distribution for the cancer clinical trial we have introduced in the example. The three-stage design with was considered to calculate the sample size. The derived sample sizes range from 588 to 606. The mean and standard deviation of them are 598.2 and 5.13. Correspondingly, the number of deaths ranges from 421 to 434 with the mean of 428.1 and standard deviation of 3.81. It seems that 10 simulations are enough to provide a robust estimate. Therefore, in order to get the robust estimate of the sample size, 10+ simulations with various seeds are suggested and their mean is recommended as the final sample size. Moreover, we find that the sample sizes calculated under the exponential distribution and the Weibull distribution with the shape parameter of 1 are not equal exactly even if the same seed was specified, which is seen in Table 2 and 3. It is because different SAS functions were called to generate the streams of random number, which results in the differences of derived sample sizes. But the mean sample size under the Weibull distribution with of 1, which was estimated from 10 simulations with the same seeds as the simulations of the exponential distribution, is 601.8. It is very close to the mean sample size under the exponential distribution.

However, adaptive design, which is considered to be an extension of group sequential design, has been increasingly applied in the recent years. It allows sample size re-estimation after interim analysis to cope with the inaccurate estimate on the efficacy of the drug when designing the trial. The simulation-based approach introduced in this article, which was initiated from the group sequential design, can also be applied in adaptive design by employing the following strategy. In the planning phase of the trial, a series of possible scenarios about the efficacy of the test drug are assumed in advance and the sample size scheme for each possible scenario is calculated by employing %n_gssur. After the interim analysis, the investigator is able to choose the suitable sample size from the scheme for the subsequent stage according to the results of interim analysis. In the strategy, no modification of the program is necessary. The strategy protects the integrity of adaptive design to a greater extent than the methods of sample size re-calculation after interim analysis. But the error-spending boundaries may be not fit in the strategy and they have to be determined by extensive simulations. The detailed procedures for stopping boundary determination and sample size scheme calculation can be seen in [27]. On the other hand, the minor modifications of the program are inevitable if the sample size re-calculation after interim analysis is considered. To run the macro for sample size re-calculation, the interim data is necessary and the statistical test procedure has to be modified. Here, the log-rank test is still suggested for interim/final analysis and the derived p-values are pooled by employing the inverse normal method [26], [28]. Depending on these modifications on the SAS program, it can be further applied to re-estimate the sample size in adaptive design.

Supporting Information

Program S1.

The detailed codes and sample call of the SAS program %n_gssur.

https://doi.org/10.1371/journal.pone.0044013.s001

(SAS)

Acknowledgments

We would like to thank the academic editor and anonymous reviewers for their helpful suggestions.

Author Contributions

Conceived and designed the experiments: ZJ JX. Performed the experiments: ZJ LW CL. Analyzed the data: LW. Wrote the paper: ZJ. Comments and evaluations on the method from the clinical perspective: HJ.

References

1. Pocock SJ (1977) Group sequential methods in the design and analysis of clinical trials. Biometrika 64: 191–199.
- View Article
- Google Scholar
2. O'Brien PC, Fleming TR (1979) A multiple testing procedure for clinical trials. Biometrics 35: 549–556.
- View Article
- Google Scholar
3. Lan KKG, DeMets DL (1983) Discrete sequential boundaries for clinical trials. Biometrika 70: 659–663.
- View Article
- Google Scholar
4. Jennison C, Turnbull BW (2000) Group sequential methods with applications to clinical trials. Boca Raton: Chapman and Hall/CRC.
5. Chow SC, Shao J, Wang H (2003) Sample size calculation in clinical research. New York: Marcel Dekker.
6. Li Z, Murphy SA (2011) Sample size formulae for two-stage randomized trials with survival outcomes. Biometrika 98: 503–518.
- View Article
- Google Scholar
7. Togo K, Iwasaki M (2011) Sample size re-estimation for survival data in clinical trials with an adaptive design. Pharm Stat 20: 325–311.
- View Article
- Google Scholar
8. Desseaux K, Porcher R (2007) Flexible two-stage design with sample size reassessment for survival trials. Stat Med 26: 5002–5013.
- View Article
- Google Scholar
9. Lu QS, Tse SK, Chow SC (2010) Analysis of time-to-event data under a two-stage survival adaptive design in clinical trials. J Biopharm Stat 20: 705–719.
- View Article
- Google Scholar
10. Lu QS, Chow SC, Tse SK, Chi YC, Yang LY (2009) Sample size estimation based on event data for a two-stage survival adaptive trial with different durations. J Biopharm Stat 19: 311–323.
- View Article
- Google Scholar
11. Li G, Shih WJ, Wang Y (2005) Two-stage adaptive design for clinical trials with survival data. J Biopharm Stat 15: 707–718.
- View Article
- Google Scholar
12. Shen Y, Cai JW (2003) Sample size reestimation for clinical trials with censored survival data. J Am Stat Assoc 98: 418–426.
- View Article
- Google Scholar
13. Schäfer H, Müller HH (2001) Modification of the sample size and the schedule of interim analysis in survival trials based on data inspections. Stat Med 20: 3741–3751.
- View Article
- Google Scholar
14. NCSS. Group sequential log rank test. Power Analysis and Sample Size, version 11. Help, Chap. 710.
15. Heo M, Faith MS, Allison DB (1998) Power and sample size for survival analysis under the Weibull distribution when the whole lifespan is of interest. Mech Ageing Dev 102: 45–53.
- View Article
- Google Scholar
16. Maxwell SE, Kelley K, Rausch JR (2008) Sample size planning for statistical power and accuracy in parameter estimation. Annu Rev Psychol 59: 537–563.
- View Article
- Google Scholar
17. Lee ET, Wang JW (2003) Statistical methods for survival data analysis (3rd edition). New Jersey: John Wiley & Sons, Inc.
18. Davis DJ (1952) An analysis of some failure data. J Am Stat Assoc 47: 113–150.
- View Article
- Google Scholar
19. Epstein B, Sobel M (1953) Life testing. J Am Stat Assoc 48: 486–502.
- View Article
- Google Scholar
20. Epstein B (1958) The exponential distribution and its role in life testing. Ind Qual Control 15: 2–7.
- View Article
- Google Scholar
21. Weibull W (1939) A statistical theory of the strength of materials. Ingenioers vetenskaps akakemien handlingar 151: 293–297.
- View Article
- Google Scholar
22. Weibull W (1951) A statistical distribution of wide applicability. J Appl Math 18: 293–297.
- View Article
- Google Scholar
23. Lakatos E (2002) Designing complex group sequential survival trials. Stat Med 21: 1969–1989.
- View Article
- Google Scholar
24. Wllenberg SS, Fleming TR, DeMets DL (2003) Data monitoring committees in clinical trials. England: John Wiley & Sons, Inc.
25. Lan KKG, Demets DL (1989) Group sequential procedures: calendartime versus information time. Stat Med 8: 1191–1198.
- View Article
- Google Scholar
26. Wassmer G (2006) Planning and analyzing adaptive group sequential survival trials. Biom J 48: 714–729.
- View Article
- Google Scholar
27. Jiang Z, Xue F, Li C, Wang L, Cai H, et al. (2010) Design of adaptive two-stage double-arm clinical trials for dichotomous variables. Contemp Clin Trials 31: 242–250.
- View Article
- Google Scholar
28. Cui L, Hung HMJ, Wang SJ (1999) Modification of sample size in group sequential clinical trials. Biometrics 55: 853–857.
- View Article
- Google Scholar

[ref1] 1. Pocock SJ (1977) Group sequential methods in the design and analysis of clinical trials. Biometrika 64: 191–199.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. O'Brien PC, Fleming TR (1979) A multiple testing procedure for clinical trials. Biometrics 35: 549–556.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Lan KKG, DeMets DL (1983) Discrete sequential boundaries for clinical trials. Biometrika 70: 659–663.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Jennison C, Turnbull BW (2000) Group sequential methods with applications to clinical trials. Boca Raton: Chapman and Hall/CRC.

[ref5] 5. Chow SC, Shao J, Wang H (2003) Sample size calculation in clinical research. New York: Marcel Dekker.

[ref6] 6. Li Z, Murphy SA (2011) Sample size formulae for two-stage randomized trials with survival outcomes. Biometrika 98: 503–518.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Togo K, Iwasaki M (2011) Sample size re-estimation for survival data in clinical trials with an adaptive design. Pharm Stat 20: 325–311.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Desseaux K, Porcher R (2007) Flexible two-stage design with sample size reassessment for survival trials. Stat Med 26: 5002–5013.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Lu QS, Tse SK, Chow SC (2010) Analysis of time-to-event data under a two-stage survival adaptive design in clinical trials. J Biopharm Stat 20: 705–719.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref10] 10. Lu QS, Chow SC, Tse SK, Chi YC, Yang LY (2009) Sample size estimation based on event data for a two-stage survival adaptive trial with different durations. J Biopharm Stat 19: 311–323.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref11] 11. Li G, Shih WJ, Wang Y (2005) Two-stage adaptive design for clinical trials with survival data. J Biopharm Stat 15: 707–718.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref12] 12. Shen Y, Cai JW (2003) Sample size reestimation for clinical trials with censored survival data. J Am Stat Assoc 98: 418–426.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref13] 13. Schäfer H, Müller HH (2001) Modification of the sample size and the schedule of interim analysis in survival trials based on data inspections. Stat Med 20: 3741–3751.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref14] 14. NCSS. Group sequential log rank test. Power Analysis and Sample Size, version 11. Help, Chap. 710.

[ref15] 15. Heo M, Faith MS, Allison DB (1998) Power and sample size for survival analysis under the Weibull distribution when the whole lifespan is of interest. Mech Ageing Dev 102: 45–53.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Maxwell SE, Kelley K, Rausch JR (2008) Sample size planning for statistical power and accuracy in parameter estimation. Annu Rev Psychol 59: 537–563.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Lee ET, Wang JW (2003) Statistical methods for survival data analysis (3rd edition). New Jersey: John Wiley & Sons, Inc.

[ref18] 18. Davis DJ (1952) An analysis of some failure data. J Am Stat Assoc 47: 113–150.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref19] 19. Epstein B, Sobel M (1953) Life testing. J Am Stat Assoc 48: 486–502.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref20] 20. Epstein B (1958) The exponential distribution and its role in life testing. Ind Qual Control 15: 2–7.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref21] 21. Weibull W (1939) A statistical theory of the strength of materials. Ingenioers vetenskaps akakemien handlingar 151: 293–297.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref22] 22. Weibull W (1951) A statistical distribution of wide applicability. J Appl Math 18: 293–297.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref23] 23. Lakatos E (2002) Designing complex group sequential survival trials. Stat Med 21: 1969–1989.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref24] 24. Wllenberg SS, Fleming TR, DeMets DL (2003) Data monitoring committees in clinical trials. England: John Wiley & Sons, Inc.

[ref25] 25. Lan KKG, Demets DL (1989) Group sequential procedures: calendartime versus information time. Stat Med 8: 1191–1198.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref26] 26. Wassmer G (2006) Planning and analyzing adaptive group sequential survival trials. Biom J 48: 714–729.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref27] 27. Jiang Z, Xue F, Li C, Wang L, Cai H, et al. (2010) Design of adaptive two-stage double-arm clinical trials for dichotomous variables. Contemp Clin Trials 31: 242–250.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref28] 28. Cui L, Hung HMJ, Wang SJ (1999) Modification of sample size in group sequential clinical trials. Biometrics 55: 853–857.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

Figures

Abstract

Introduction

Methods

Parametric distributions of time-to-event data

Group sequential design of time-to-event data

The simulation-based method for sample size estimation

Step 1.

Step 2.

Step 2-1.

Step 2-2.

Step 2-2-1.

Step 2-2-2.

Step 2-2-3.

Step 2-2-4.

Step 2-3.

Results

(1) The decision process for the optimal interim monitoring plan

(2) The comparison of the results for varied γ under the Weibull distribution

Discussion

Supporting Information

Program S1.

Acknowledgments

Author Contributions

References