Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Applications of the prediction of satisfaction design for monitoring single-arm phase II trials

  • Zohra Djeridi ,

    Roles Methodology, Project administration, Software, Supervision

    z_djeridi@univ-jijel.dz

    ‡ ZD is senior authorship on this work.

    Affiliations Department of Mathematics, University of Jijel, Jijel, Algeria, ACED Laboratory University of 8 May 1945 Guelma, Guelma, Algeria

  • Ahlem Ghouar ,

    Contributed equally to this work with: Ahlem Ghouar, Hamid Boulares, Mohamed Bouye

    Roles Writing – original draft

    Affiliation Preparatory Department, Higher School of Management Sciences-Annaba, Annaba, Algeria

  • Hamid Boulares ,

    Contributed equally to this work with: Ahlem Ghouar, Hamid Boulares, Mohamed Bouye

    Roles Formal analysis

    Affiliations ACED Laboratory University of 8 May 1945 Guelma, Guelma, Algeria, Department of Mathematics, University of 8 May 1945 Guelma, Guelma, Algeria

  • Mohamed Bouye

    Contributed equally to this work with: Ahlem Ghouar, Hamid Boulares, Mohamed Bouye

    Roles Writing – original draft

    Affiliation Department of Mathematics, College of Science, King Khalid University, Abha, Saudi Arabia

Abstract

Prediction of satisfaction design, with binary endpoints, is an innovative strategy for phase II trials. We explain this hybrid frequentist-Bayesian strategy with an adept statistical plan and thorough findings, incorporating a description of study design features such as the sample size and the beta prior distribution, to simplify the Bayesian design. We also provide a set of tables and figures ranging from the stopping boundary for futility to the prediction of satisfaction, performance (type I error, power, and the probability of early termination PET), and sensitivity analysis for prediction of satisfaction. The statistical plan includes the operating characteristics through simulation study. Several trial examples from phase II lung cancer studies demonstrate the approach’s practical use. The prediction of satisfaction design presents a flexible method in clinical study. This statistical study adds value to the application by broadening its scope.

1 Introduction

Bayesian analysis of phase II clinical trials has become an integral part of the development and dissemination of therapeutic drugs, surgical and technological interventions [13]. The routine design and analysis of the phase II clinical trials is a constant source of inspiration and creativity to Bayesians who are interested in medical statistics. They find a combination of frequentist (classical) theoretic use of type I and type II errors, Fisherian use of tail-areas as standardised measures of evidence, the predictive probability and stopping boundaries for sequential analysis carefully preserving type I error. [2, 4, 5].

Recently many works have been done in a sequential setting, where a trial has a several interim analysis and it can be stopped for futility and / or efficacy after any stage, rather than waiting until the completion of the entire trial. It allows for ongoing monitoring of the accumulating data and enables early termination or modification of the trial based on predefined stopping rules. For example, in phase II cancer clinical trials, multi-stage designs are frequently employed. Especially for phase II single-arm (uncontrolled) design, many Bayesian sequential techniques have been proposed, including posterior probability, predictive probability, and even combining with a frequentist approach [6]. For example, Tall and Simon [7] utilized posterior probability to build stopping criteria for continuously monitoring schema, whereas Lee and Liu [8], Saville et al. [3] and Chen et al [6] used predictive probability to construct the boundary for multi-stage designs. Sambuchini, in 2021, [9], described a number of Bayesian approaches to perform interim analysis in single-arm trials based on a binary response variable. Djeridi and Merabet 2019 [10] proposed a hybrid frequentist Bayesian approach to define the stopping boundary by using the prediction of satisfaction. The efficacy θ0 of an experimental therapy E is evaluated in this design using data from an uncontrolled (single-arm, i.e., all participants receive the same treatment or intervention being investigated) trial of E. The experiment will continue until E demonstrates a high prediction of satisfaction to the promising or not promising, or until a pre-set maximum sample size is achieved. The idea is to assess the probability of achieving a desired outcome (that assures the efficacy of the therapy) at the trial’s scheduled end point based on the observed interim result.

The major problem in sequential testing procedures is the inflation of significance level through repeatedly analysing the data as it accumulates. So, the use of the index of satisfaction which is an increasing function of the p-value of the significant result, if the trial will continue to its term, to build the index of satisfaction can resolve this problem to ensure that the type I error of the final analysis is controlled. Furthermore, by using Bayesian predictive techniques, the prediction of satisfaction is simpler to interpret and more helpful for decision making.

Bayesian group sequential approaches, in general, are not intended to optimize frequentist operating characteristics, but rather to offer stopping criteria for eventual decision making [1]. But, the prediction of satisfaction design ensures the overall false positive rate by employing the concept of the p-value of a test concluding a positive result by the conclusion of the trial based on the present stage’s cumulative information. In this study, we claim that the prediction of satisfaction design offers sufficient test power. Also, we discuss sample size selection and monitoring criteria. Numerical implementation directions are presented. Based on the predictive distribution of the observed success rate, this strategy provides criteria for early termination of trials that are unlikely to lead to conclusive results.

The outline of the paper is as follows: in section 2, we establish the concept of index and prediction of satisfaction in a sequential setting. The use of the prior distribution and the maximum sample size are also discussed with some modelling considerations. In section 3, we discuss the sensibility analysis for multi-stage plans. Then, several real examples that illustrate the use of this approach in the design of clinical trials are presented in section 4. Finally, the paper closes with a discussion.

2 Analytical method

2.1 Concept of the index of satisfaction in a sequential setting

Typically statisticians perform a hypothesis test, utilize the concept of the index of satisfaction to assume discovering a significant conclusion by the end of the study; that is, rejecting the null hypothesis H0 of an ineffective treatment.

Correlatively, this statistician is especially more satisfied when, based on the experimental data this effect appears more significant. It is more interesting to consider the level at which the result always seems significant. Therefore, if no significant impact is observed, the satisfaction indexes are null, and in the opposite scenario, they are an increasing function of the conventional indicator of significance, which is the p-value of the final result in the test theory [11]. If a trial yields a positive result, early termination allows the new product to be used sooner; if a bad result is yielded, early termination guarantees that resources are not lost [12]. So, it is more useful to predict the level of satisfaction by a weighted average of this index with regard to the predictive probabilities over all the points, if observed, would just lead to the rejection of the null hypothesis, which is conditioned by the first stage outcome [13]. Given the number of current status, the predicted level of satisfaction is the chance of achieving treatment effectiveness at the conclusion of the trial.

This approach can be beneficial for interim monitoring, according the prediction of satisfaction that surpasses a pre-specified threshold to indicate the possibility of early termination. If the prediction of satisfaction is lower than the threshold, the therapy is regarded ineffective, and the action of early termination is contemplated. When it is close to one, the likelihood of success is high.

The prediction of satisfaction is a simple but powerful concept to construct a phase II clinical design. Fig 1 summarizes the algorithm of the concept above.

thumbnail
Fig 1. Flow chart of the PS-design for futility interim analysis.

https://doi.org/10.1371/journal.pone.0305814.g001

2.2 Notations

The statistical model used in this method based on the unknown success probability parameter θ ∈ [0, 1]. Let denotes by X to the binary outcomes variable, such that:

The random “total number of success” acquired at the end of the trial out of n patients is denoted by , with the binomial sampling distribution B(n, θ).

In a Bayesian conjugate assessment, we derive the posterior distribution by inserting the Beta prior for θp(θ) = Beta(θ; a, b), such that: The Choice and the elicitation of the prior is explained in the following section.

Furthermore, the Bayesian procedure has the advantage of allowing us to analytically determine the predictive distribution for the number of responses within a future sample given the recently updated response rate, which is, as is well known, a beta-binomial [14].

Let θ0 indicates a fixed value that represents the success probability for the control or standard therapy according to past data. Therefore, we assume that the therapy is promising if its efficacy probability surpasses the goal value θ0.

2.3 Elicitation of the prior

The prior distribution of the unknown parameter reflects past knowledge about the efficacy of the new therapy, and we utilize it to obtain: (1) the posterior probability and (2) the prediction distribution of the number of successes.

For binary observations, knowledge regarding the experimental treatment’s response data aids in determining the Beta prior distribution, p(θ) = Beta(θ; a, b). While many experimental therapies are generally the first in a series of studies, some involve a mix of standard therapy and a novel medicine or a change of regular treatment [6]. Thus, using historical data to better shape prior distribution within the Bayesian context should be beneficial.

The mean response rate of the beta distribution is . When the (a + b) grows, the confidence in prior information becomes stronger and will most likely influence the conclusions. So, the investigator can use this response rate estimate, which is the mean of the prior distribution of θ and an expression of his degree of confidence in via a specific coefficient of variation of , . A value of would indicate a high confidence that , moderate and low confidence etc…

Let , for the Beta distribution’s hyper-parameters, we get: where D indicates hyper-parameters of the design prior.

In the sequential use of Bayes theorem, the posterior from the first stage of the research simply turns into the prior for the second, and the ultimate posterior distribution follows in the same way [10].

2.4 Prediction of satisfaction in a binomial sequential setting

Designing a trial using the PS approach: Search the maximum sample size (Nmax), the cohort size of the current stage (nk), the minimum outcome in the current stage () and the maximum outcome in Nmax (qα) that the treatment is considered efficient by the end of the trial such that the constraints of type I, type II and γ are satisfied as follows:

The sample size is assigned at each stage, with nk representing the sample size of patients in the kth interim analysis and signifying the overall sample size. Generally, a design that has type I error and type II error satisfying the constraints will be selected.

In the interim look, by considering , the number of responders in the current cohort (nk patients), we should have responders if the trial goes to its term (to simplify it will be denoted by z). Then, the predictive density function of the number of responders among the remaining m* = Nmaxnk patients given the recently updated response rate, , may be computed analytically by: for y = 0, …, m*. For a test of level α, where the null and alternative hypothesis are: (1) where θ0 is the target value response rate of the desirable level of the treatment efficacy. Of course, rejecting “H0” means “accepting treatment efficacy”. In sequential trials, it is more interesting to consider at what level the final results will be in order to get a significant test. Then, the test p-value at the planned conclusion is defined by p(z) = inf{ϵ; zR(ϵ)}, where R(ϵ) is the critical region. It has used to calculate the satisfaction of the investigator denoted by ϕ(z), and calculated by [12], as: so the index of satisfaction for the binomial model will be: Where B(.) is the cumulative distribution function of the binomial model, and the rejection region is R(α) = {zqα} with qα = inf{u; Pr(Zu|θ0) ≤ α}.

Given economic/ethical restrictions, it is frequently unnecessary to continue a clinical study after an intermediate analysis if we can reliably predict that it will result in a significant conclusion. However, based on the intermediate analysis result , it is possible to predict what the satisfaction will be at the conclusion of the experiment. By the stochastic curtailment methods, it is more interesting to forecast the degree of contentment by averaging the index of satisfaction with regard to the predictive probability over the entire space conditioned by the first stage result [10, 13]. Finally, for each interim analysis the prediction of satisfaction is updated by: (2) This procedure can be used as a stopping rule for interim monitoring. In this case, the decision to stop the trial is based on the prediction of satisfaction exceeding a given threshold. However, we declare that the therapy is effective (or promising) if the prediction of satisfaction (given in the current stage) is higher than a threshold γ, (γ ∈ [0.5; 1]).

Note that replaces the power of the test in the theory of the index of satisfaction recommended by Djeridi and Merabet [10]. The investigator is responsible for determining the threshold of the prediction of satisfaction in each intermediate analysis such that the pursuit of the experiment is discouraged.

For example, we define the design parameters as follows: , and 1 − β = 0.8 for a maximum sample size of Nmax = 32 the frequentist one, and the design maximum sample size Nmax = 27, for two scenarios (a) when the prior is informative (p(θ) = Beta(aD, bD)) (b) if the prior is non-informative (p(θ) = Beta(1, 1)). In this case, we plan an interim analysis after enrolling 10 patients; that is K = 1 and n1 = 10. Assume that the prediction of satisfaction (PS) threshold for inefficacy stopping is γ = 0.5. Table 1 displays the prediction of satisfaction for success numbers at the first interim look z1 = 0, …, 10. If z1 ≤ 2 the prediction of satisfaction π(z1) is smaller than 0.5 for the scenario (a) and if it is 3 or less for the scenario (b) (which is a consequence of the lack of information about θ in scenario (b)), and the study will be terminated due to ineffectiveness. On the other hand, continuing the experiment to the final analysis, the sample size was chosen by comparing the overall frequentist operating characteristics section 2.5.

thumbnail
Table 1. Prediction of satisfaction (PS) at the first interim look when θ0 = 0.3, θ1 = 0.5, α = 0.1, (1 − β) = 0.8 and n1 = 10 for two scenarios (a) p(θ) = beta(aD, bD) for and CV = 10% and (b) p(θ) = beta(1, 1).

https://doi.org/10.1371/journal.pone.0305814.t001

2.4.1 Set up a stopping rule.

As a stopping rule, predicting satisfaction is a useful tool. When it’s low, it can be used to end attempts prematurely due to futility. By setting a threshold γ (γ ∈ [0.5; 1]) for the prediction satisfaction to indicate an unfavorable probability of success, the number of responders , for the stopping boundary at the kth stage is the smallest number of success such that: is satisfied.

For example, for two stages trials, we establish the stage design that meets the error probability criteria and minimizes the maximum sample size to discover the minimax design or minimizes the predicted sample size to get the optimal design as the following:

For each value of total sample size Nmax and each value of n1 in the range (10 : Nmax − 10), we determine the smallest cohort size and and verify the constraints of type I and type II errors. Of course, the maximum sample size Nmax and the boundary level qα required for the second stage drastically affect the values of and . As a result, if the number of observed responders is less than or equal to the threshold , we terminate the study for treatment ineffectiveness.

Analogously, if we agree to continue the study, we will get the (K − 1)th observation in the final interim analysis. Therefore, we will need to plan a sample y with a cohort size of . Thus, we repeat the method to show an unfavorable chance of success and the number of responders for the stopping boundary at the kth stage.

Another function is to stop trying validity if the prediction of satisfaction is very high. However, this is less feasible because of the early phase II study’s limited sample size [10].

2.5 Remarks about maximal sample size determination

The advantage of sequential methods results in saving sample size, time and cost when compared with standard static sample procedures. We determine the initial sample size N on the basis of the frequentist one stage which is (3) where uρ is the standardized normal variate exceeded with probability ρ and .

As there may be some uncertainty in the design prior during the planning phase, the preliminary size of the sample should be re-evaluated by utilizing interim data while there is one method to modify the maximal sample size Nmax, which involves revising it during interim evaluations. The requirement of such a large sample size will be prompted by ethical considerations as well as resource constraints. The search over Nmax began with a value of (3). We verified below this beginning point to confirm that we had identified the least possible sample size Nmax for which a non-trivial () multi-stage design met the error probability constraints. The enumeration algorithm looked upwards from this highest value of Nmax until the optimum was clearly identified.

Table 2 display the needed sample size Nmax for various criteria and values of θ0, , CV, α and (1 − β). We recognize that CV = 15% refers to around 0.20 for the 95% credible interval range. We can observe that the larger α and (1 − β) are in contrast to the other factors, the larger the sample size is. On the other hand, the choice of has some influence on the sample size, and CV has a substantial effect if is near to θ0. For all the cases, the PS design reduces the sample size as that obtained from the frequentist approach (the reduction is almost from 10% to 50%) that corresponds to the hypothesis testing framework (1), for the type I error α and a power 1 − β, if . Table 2 shows the frequentist findings in brackets.

thumbnail
Table 2. Sample size Nmax for different criteria and values of θ0, θ1, CV, α, and 1 − β for (p(θ) = Beta(aD, bD)).

https://doi.org/10.1371/journal.pone.0305814.t002

2.6 Modelling considerations

Under the prediction of satisfaction framework, there are two approaches to Bayesian interim analysis described in this paper. The first one focuses on decision analysis that considers “the cost of decision errors and the prediction of satisfaction of the given outcomes in deciding whether to stop early”. In this approach, the design is specified by: the maximum sample size (Nmax), the cohort size of the current stage (nk), the minimum outcome in the current stage () and the maximum outcome in Nmax (qα) that the treatment is considered efficient by the end of the trial such that the constraints of type I, type II and γ are satisfied (this approach is used to find the minimax design in the previous section). Another approach is to calculate the prediction of satisfaction for any maximum sample size, for any size of interim cohort to find the design’s plan. This design requires no appeal to use the power for its evaluation. However, in both cases, once the rule is defined, we have to determine the design frequentist operation characteristics (type I and type II errors, the probability of early termination and the expected sample size).

Because the stopping rule characterizes the study design, we have to provide the stopping boundary (). The frequentist operating characteristics functions are, then, given by:

1- Let b(.) and B(.) be the probability mass function and the cumulative distribution function of the binomial distribution, respectively. The rejection probability after the first stage is: Hence, in this design, the probability of early termination (PET) after the first stage is: Furthermore, the expected sample size is:

2- For a three stage design, the rejection probability becomes: the probability of early termination (PET) is then:

3- For a k−stage design, (k > 3), the rejection probability becomes: the probability of early termination (PET) is then:

4- Finally,

3 Numerical evaluation

The benchmark for studying the proposed design is type I and type II errors and PET. The most well-known frequentist-based multi-stage designs for phase II clinical trials try to control such errors at pre-specified levels (see [9, 15]). The small variation in operating characteristics between admissible plans indicates the robustness of these plans to a slight variation in stage specific sample sizes actually attained [16]. As a result, it may be interesting to demonstrate how the suggested design performs in terms of these two types of error probabilities with the probability of early termination.

3.1 Sensitivity analysis of prediction of satisfaction plans

Conducting sensitivity analysis offers a chance to assess the durability of the plan and optimize it by modifying crucial variables. Four performance-related criteria are investigated: (i) γ, the threshold of the prediction of satisfaction, (ii) α, type I error rate, (iii) Nmax, the maximum sample size, (iv) The hyper-parameters of the prior distribution. When all other factors are held constant, the influence of each parameter is studied. We investigate three cases: two, three and multi-stage. The design plans are derived numerically by parallel computing utilizing a self-written algorithm in R.

3.2 Results

We have considered θ0 = 30% as a suitable objective level of treatment effectiveness probability, θ1 = 50% (the minimum favourable response rate). The degree of significance α is taken as 5% and 10% with a type II error of 20%. The predefined sample size is Nmax = 50 and assume the uniform prior distribution (p(θ) = Beta(1, 1)). If the prediction of satisfaction is more than 0.5, the therapy is regarded promising. Thus, with a total of 50 patients and to assert efficacy, we will require at least 20 responses (i.e., qα = 20) by the index of satisfaction for α = 0.05 and at least 19 responders (i.e., qα = 19), if α = 0.10.

Table 3 presented two-stage, three-stage and 5 multi-stage PS plans for comparison that could be.

thumbnail
Table 3. Prediction of satisfaction at the first interim look when θ0 = 0.3, θ1 = 0.5, for two scenarios (a)α = 0.05 and (b) α = 0.10, where (1 − β) = 0.8 and n1 = 10.

https://doi.org/10.1371/journal.pone.0305814.t003

The risk of type I errors decreases as the number of intermediate looks increases. This is due to the use of the p-value which reserves the level of the overall significance of the repeated testing hypothesis with accumulating data [17]. Conversely, type II errors slightly decrease. Furthermore, we realized that when the number of interim observations grows, PET increases somewhat and the predicted sample size, E(N/θ1), drops. On the other hand, the expected sample size, when θ = θ1, is smaller than the expected sample size when θ = θ0, because the prior confidence is specified in a good treatment.

The prediction of satisfaction design has generally a larger boundary, , when type I error rate is smaller, which effects type II error probability that becomes smaller. So, for controlling type II error probability, a larger nk is needed. Thus, although (and therefore PET) under the null, is higher, the PS design still has a larger expected sample size for a small type I error rate, but still less than for one stage design.

3.3 Sensitivity analysis for the two-stage strategy

To summarise, the study of the essential parameters to assess performance of the two-stage design are illustrated by:

  1. (i) Threshold of the prediction of satisfaction: as it is shown in Fig 2, if γ increases from 0.40 to 0.85, type I error and power will decrease (from 0.05 down to 0.015 and from 0.85 down 0.41, respectively) but PET increases (from 0.96 to 0.98). The influence is significant on the type I error, mild on power, and minor on PET.
  2. (ii) The join effect of α and γ: by considering (α = 0.01, 0.05, and 0.10) with the threshold of the PS is from 0.40 to 0.85, we investigate the combined impact. It is evident, in Fig 3, that PET increases as γ increases while type I error and power decrease. The difference between the values of the three metrics becomes negligible as γ increases, especially for PET. According to the findings, the join effect has a high influence on Type I error and PET and a middle effect on power.
  3. (iii) Cohort size: a change in sample size has little effect on power or PET, but it reduces the type I error rate (Fig 4). The impact is middle on the three metrics and this is due to the discreteness of the model’s distributions.
  4. (iv) Beta prior: the result has shown that the non-informative priors have little impact on the three metrics, because both parameters display the amount of responses and non-responses (Fig 5).
thumbnail
Fig 2. Sensitivity analysis in two-stage plan by altering the threshold for predicting satisfaction.

https://doi.org/10.1371/journal.pone.0305814.g002

thumbnail
Fig 3. Sensitivity investigation of the combined effect of both threshold of the prediction of satisfaction and α in the two-stage example.

https://doi.org/10.1371/journal.pone.0305814.g003

thumbnail
Fig 4. Sensitivity analysis by changing sample size in the two-stage plan.

https://doi.org/10.1371/journal.pone.0305814.g004

thumbnail
Fig 5. Sensitivity analysis by altering prior hyper-parameters in the two-stage plan.

https://doi.org/10.1371/journal.pone.0305814.g005

A comparable impact is produced by a prior determined by response rate at the null or alternative hypothesis with a large CV (example CV ≥ 0.4). In contrast, if CV is small (example CV ≤ 0.4), (a + b) significantly controls the conclusions. For example, when with CV = 10%, a = 69.7 and b = 162.63, to claim effectiveness, it will require at least 19 responders (out of 50 patients). If there were 11 responders or less in the first stage, the experiment would be terminated early. As a consequence, the experiment might be ended in the first stage with PET = 98% and low power 65%. However, for a prior with a response rate of and a CV of 10%, at least 8 responders are required to declare efficacy. In this case, PET = 68% with a power of 98% and a large type I error (0.32).

3.4 Detailed sensitivity analysis for multi-stage plan

The three and the multi-stage plans assume the uniform prior distribution. Each of these plans has slightly greater power and a slightly smaller significance level with a smaller E(N/θ1) than the two-stage plan due to the early termination opportunities they provide. The different stage cases show that as the number of interim analysis increases, the power increases and the type I error rate decreases. Moreover, due to the strict condition of using the p-value of the final significance conclusion, the stopping boundaries in the different plans are larger, so the probability of early termination is bigger than the other trial designs.

4 An illustrative example

The major goal of this research is to see how effectively a combination of nivolumab, ipilimumab, and nintidanib works in advanced non-small cell lung cancer (NSCLC) (NCT03377023 on clinicaltrials.gov).

A phase II single-arm trial is conducted for two cohorts: immunotherapy naïve and immunotherapy treated previously. The overall response rate (ORR) was our primary end objective, with success defined as a complete or partial response. Each cohort’s sample size was set at 40 patients. In each stage, the cohort includes 20 patients.

For the immunotherapy naïve cohort: consider a null hypothesis of 0.30 ORR and an alternative hypothesis of 0.50 ORR, with a one-sided type I error of 0.05 and a power of 0.8. As a consequence of analyzing 40 eligible patients, the final ORR was 42.5% (17/40). Partial responses were seen in 9 and 17 of the first 20 and 40 recruited patients, respectively (no complete response). To demonstrate the suggested technique, we give design parameters such as θ0, α = 0.05, K = 2, n1 = n2 = 20, Nmax = 40. Assume the probability criterion for stopping futility is ≤ 0.5. After the first step, and with no knowledge on the unknown parameter θ, the flat prior Beta(1, 1) is optimal. Because the ORR in this interim investigation was 45% (9/20), the predicted level of satisfaction is 0.66 > 0.50. The experiment will not be terminated due to inefficacy, and the therapy merits further investigation in phase IIB or III studies. In this case, the operating characteristics were type I error = 0.05, power = 0.75 and PET = 95% (Table 4).

thumbnail
Table 4. Comparison of prediction of satisfaction to Bayesian predictive probability design.

https://doi.org/10.1371/journal.pone.0305814.t004

For the cohort of immunotherapy treated previously : let the null hypothesis be set as 0.07 ORR versus an alternative hypothesis about 0.20 ORR, by considering one sided type I error = 0.05 and a power of 0.8. The therapy is treated as effective if a total of responders are ≥ 6 (15%(6/40) of ORR), and the stopping rule is ≤ 2. The prediction of satisfaction is π(x1 = 3) = 0.63, the design has 5% type I error, 80% power and 95% of PET.

4.1 Comparison of the prediction of satisfaction with the predictive probability design

Lee and Liu in 2008, [8], suggested a Bayesian phase II design in which the futility criterion is based on an evaluation of the predicted probability that the trial would yield a conclusive result at the intended end of the research, given the observed data, at any interim analysis.

Consider in the current n patients, zn responses are observed and let Y be the random variable reflecting the number of probable future Nmaxn patients’ responses It is generally considered that the posterior predictive distribution of Y is When the investigation is completed and the outcome y is obtained, the experimental therapy will be considered sufficiently promising if the criteria that follows is achieved: where θT is a pre-specified probability threshold. Meanwhile, as Y has not yet been realized, the experiment should be carried out, and the posterior predictive distribution may be used to compute the probability of a positive conclusion for the maximum intended sample size, that is: where I{.} refers to the indicator function. A low PP score suggests that the novel drug will most likely be ruled inefficient by the end of the trial.

Using the predictive probability design with the same flat prior and a threshold of 20% for the PP, the trial of immunotherapy naïve will stop for futility if patients observed. While for the trial of immunotherapy treated previously, the stopping boundaries are . For the two trials, the prediction of satisfaction was 0.08 which would not meet the stopping rules. Moreover, the prediction of satisfaction has a higher PET and smaller power. This is due to the process of calibration of the probability thresholds, which assures a type I error rate of 0.05 and a power of 80% (Table 4). Therefore, as more patients are enrolled, more PET is obtained by using progressively stricter futility rules.

5 Discussion

Sequential analyses in clinical trials constitute an essential for maintaining overall quality standards, and such analyses may be critical if clinical trials are to be ethically accepted. The main drawback of sequential analyses is the raised risk of type I error because of the repeated tests. So the adoption of the index of satisfaction approach, [10], which uses the p-value of a significance result if the trial continues to its term, can resolve this problem. The prediction of satisfaction has been advocated as statistical evidence for early termination of phase II clinical trials [12].

In particular a smaller sample size does not necessary imply better design but the PS-design reduces the sample size as that obtained from the frequentist approach (the reduction is almost from 10% to 50%). Indeed, for a given prior distribution the sample size required by the PS-design may decline as θ1 increases. For example, when θ1 = θ0 + 10 for θ0 = 0.5, CV = 15%, typeIerror = 0.10, and power = 0.80, the sample size Nmax = 108 but for θ1 = θ0 + 20 with the same other parameters it decreases to 29 (the case of rare promising treatment). Ideally, we would like to have Nmax increases as θ1 increases.

Inference based on prediction of satisfaction combines the advantages of Bayesian and frequentist methods of inference by allowing for inference conditional on observed data, providing for incorporation of prior information, providing a logical compromise between the relative might of accumulated data and prior information and allowing for the assessment of significance levels and power in the sample space of future observations. In the sensitivity analysis section, we showed that in different phases, the PS-design has generally a larger boundary, , when type I error rate is smaller, which affect type II error probability that becomes smaller. So a larger cohort size, nk, is needed to control type II error probability. Thus although and therefore the probability of early termination under the null, is higher, the PS-design still has a larger expected sample size for a small type I error rate, but still less than for one stage design.

We have summarized the perspective of this hybrid frequentist-Bayesian sequential design and discussed its implications. In addition, we evaluated many experimental situations to determine its simulated operational characteristics. To analyse this influence, we conducted replicated simulations to demonstrate how optimality is connected to the intricate interplay between type I error, the threshold γ, sample size, and prior information. Overall, the prediction of satisfaction design achieves a satisfactory sensitivity analysis if a significant number of responses are observed throughout the recruiting phase to allow the adaptive technique to work. Furthermore, even in experimental scenarios with no prior information, a high level of satisfaction is reached in favour of the most promising therapy. The feasibility of our concept was further demonstrated by redesigning an ongoing real lung cancer trial for immunotherapy.

From a practical standpoint, our design does not require intensive computation for sample size searching and there are few parameters to control. Furthermore, simple simulation is required to evaluate the operating characteristics at the design phase. Moreover, due to the strict condition of using the p-value of the final significance conclusion, the stopping boundaries in the different plans are larger, so the probability of early termination is bigger than the other trial designs like the PP-design as it was shown in the illustrative example. The authors are currently working on creating a user-friendly interface (R package) to apply the technique, which will be released in the near future.

In conclusion, we believe that the prediction of satisfaction design should be seen as an alternative among the usual adaptive designs for interim futility analysis in single arm phase II trials. These allow us to generally answer the experimenter questions about the stability of already observed results to future data. It can be used to advantage in the context of the sequential interim analysis. This approach appears to be of particular relevance in determining the sample size required to reach the desired conclusion.

Acknowledgments

We are extremely grateful to the Associate Editor and the anonymous reviewers for providing insights regarding the behavior of the proposed design and for their constructive comments.

References

  1. 1. Gsponer T, Gerber F, Bornkamp B, Ohlssen D, Vandemeulebroecke M, Schmidli H. A practical guide to Bayesian group sequential designs. Pharmaceut. Statist. Math. 2014; 13: 71–80. pmid:24038922
  2. 2. Spiegelhalter DJ, Freedman LS. Bayesian Approaches to Clinical trials. Bay. Stat. 1988; 3: 453–477.
  3. 3. Saville BR, Connor JT, Ayers GD, Alvarez J. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clin. Trials. 2014; 0: 1–9. pmid:24872363
  4. 4. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Chapman & Hall/CRC, Boca Raton, 2000.
  5. 5. Park Y. Optimal two-stage design of single arm Phase II clinical trials based on median event time test. PLoS ONE. 2021; 16(2): e0246448. pmid:33556130
  6. 6. Chen DT, Schell MJ, Fulp WJ, Pettersson F, Jhanelle SK, Gray E and Haura EB. Application of Bayesian predictive probability for interim Futility analysis in single-arm phase II trial Transl. Cancer Res. 2019: 8: 404–420. https://dx.doi.org/10.21037/tcr.2019.05.17.
  7. 7. Tall PF, Simon R. Practical Bayesian Guidelines for phase IIB Clinical trials. Biometrics. 1988; 50: 337–49.
  8. 8. Lee JJ, Liu DD. A predictive probability design for phase II cancer clinical trials. Clinical Trials. 2008; 5: 93–106. pmid:18375647
  9. 9. Sambucini V. Bayesian Sequential Monitoring of Single-Arm trials: A Comparison of Futility Rules Based on Binary Data. Int. J. Environ. Res. Public Health. 2021; 18: 8816. pmid:34444562
  10. 10. Djeridi Z, Merabet H. A hybrid Bayesian-Frequentist predictive design for monitoring multi-stage clinical trials. Sequential analysis. 2019; 38(3): 301–317.
  11. 11. Merabet H. Index and prevision of satisfaction in exponential models for clinical trials. Statistica. 2004; 3: 441–453.
  12. 12. Merabet H, Labdaoui A, Druilhet P. Bayesian prediction for two-stage sequential analysis in clinical trials. Communications in Statistics—Theory and Methods. 2017; 46 (19): 9807–9816.
  13. 13. Holst E, Thyregod P, Wilrich PT. On conformity testing and the use of two stage procedures. Int. Stat. Rev. 2001; 69: 419–432.
  14. 14. Aitchison J, Dunsmore I.R. Statistical Prediction Analysis. Compridge University Press, New York; 1975.
  15. 15. Rahman R, I Alam M. Stopping for efficacy in single-arm phase II clinical trials. Journal of applied Statistics. 2022; 49 (10): 2447–2466. pmid:35757036
  16. 16. Herson J. Predictive probability Early termination plan for phase II clinical trials. Biometrics. 1979; 35: 775–783. pmid:526523
  17. 17. Bartoš F, Aust F, Haaf JM. Informed Bayesian survival analysis. BMC. Medical Res. Method. 2022: 22–238. pmid:36088281