Bayesian credible subgroup identification for treatment effectiveness in time-to-event data

Due to differential treatment responses of patients to pharmacotherapy, drug development and practice in medicine are concerned with personalized medicine, which includes identifying subgroups of population that exhibit differential treatment effect. For time–to–event data, available methods only focus on detecting and testing treatment–by–covariate interactions and may not consider multiplicity. In this work, we introduce the Bayesian credible subgroups approach for time–to–event endpoints. It provides two bounding subgroups for the true benefiting subgroup: one which is likely to be contained by the benefiting subgroup and one which is likely to contain the benefiting subgroup. A personalized treatment effect is estimated by two common measures of survival time: the hazard ratio and restricted mean survival time. We apply the method to identify benefiting subgroups in a case study of prostate carcinoma patients and a simulated large clinical dataset.

2 Relationship of log hazard ratio (log HR) and a difference in restricted mean survival time (RMSTd) For each subject i = 1, . . . , n, let x i and z i be p × 1 and q × 1 vectors of prognostic and predictive covariates respectively. Furthermore, let θ i = {0, 1} be the treatment indicator, i.e. θ i = 1 indicates subject i th receives treatment. A Cox proportional hazards model assumes that where β and γ are p × 1 and q × 1 vectors of regression coefficients, and λ 0 (t) > 0 is the baseline hazard function. It follows that the survival function can be expressed as where S 0 (t) = e − t 0 λ0(u)du is the baseline survival function. For a particular subject ith, the personlized treatment effects (PTEs) as HR is defined as and the PTEs as RMSTd up to time t * > 0 is expressed as It shows that when ∆ H,i = 1 or log ∆ H,i = 0, we have ∆ Rd,i = 0. Therefore, we set δ H = 1 and δ Rd = 0 in our simulation studies so that we can compare performance of these approaches for finding credible subgroup pairs.
3 Algorithms for constructing the posterior ∆ H and ∆ Rd 3.1 Algorithms for constructing the posterior ∆ H As defined in Section 2.2 from the manuscript, the hazard ratio (HR) as a PTE for a patient with covariate z is which is a ratio between the hazards of a patient with treatment θ = 1 and θ = 0. Alternatively, log ∆ H (z i ) = z i γ is the log HR evaluated at points z i . Following the notation in Section 2.4.2, the algorithm for constructing the posterior of ∆ H is presented below Algorithm 1 Constructing the posterior ∆ H 1: Construct the covariate space grid Θ for z. 2: Initialize β (0) = (β (0) , γ (0) ) , h (0) . 3: for = 1, 2, . . . do 4: exp −h j,( −1) k∈Rj \Mj exp(x k β ( −1) ) + b . 6: end for 7: For each covariate point z i ∈ Θ, compute the posterior ∆ H (z i ) ( ) = exp(z i γ ( ) ) or log ∆ H (z i ) = z i γ ( ) (after remove number of burn-in iterations) .

Algorithms for constructing the posterior ∆ Rd
From Section 2.3 in the manuscript, the RMST difference between two arms from t = 0 to t = ν is defined as which is the difference in area between the two survival curves. Then we employ the conventional Cox proportional hazard model to estimate these two survival functions. Following the notation in Section 2.4.2, the algorithm for constructing the posterior of ∆ Rd is presented below Algorithm 2 Constructing the posterior ∆ Rd 1: Construct the covariate space grid Θ 0 for x = {x, z} and θ = 0. 2: Set Θ 1 = Θ 0 and replace θ by 1 − θ. 3: Initialize β (0) = (β (0) , γ (0) ) , h (0) . 4: for = 1, 2, . . . do 5: Simulation study under proportional hazard assumption 4.1 Simulation 1: both log HR and RMSTd.
For each simulation dataset, we evaluate the Bayesian credible subgroups method by using log HR and RMSTd. Moreover, we use the thresholds at 1 for HR and 0 for RMSTd to idetify benefiting subjects. Table 1-4 present the results of average summary statistics for different credible levels. The total coverage is always grater than the credible level. The credible pair size is decreasing when sample size and/or effective size are increasing. Sensitivity of D is higher when sample size is increasing. HR had higher sensitivity of D than RMSTd. However, both approaches have similar specificity of D. Lastly, RMSTd had smaller effect MSE than HR for large sample size (n = 500, 1000) and slightly larger than HR for smaller sample size (n = 50, 100). Table 1-4 show that, as expected, the two approaches had similar results.   Here we evaluate the Bayesian credible subgroup by using log HR to identify the benefiting subjects.     0,0,1,1,1

Simulation 3: for RMST differences.
Similarly, we evaluate the Bayesian credible subgroup by using RMSTd to identify the benefiting subjects at different values of δ Rd and credible level. Table 10 shows that the RMSTd that generate credible subgroups with higher credible level have higher specificity of D and lower sensitivity of D in a case of δ Rd = 0. Table 9 and Table 11 show that RMSTd approach has a good performance for δ Rd = 0.  0.2,0.2,1,1,1 0,0,1,1,1 0,0,1,1,1

A large simulated clinical trial dataset
Similarly, we show the posterior density of the model parameters in Fig 2, and it shows that the distributions are approximately normal.

Pointwise method
We demonstrate the pointwise method as a benchmark method to compare with our proposed method. The pointwise method uses the same Cox regression model as our method , but it does not account for multiplicity in constructing credible subgroups. Precisely, the exclusive credible subgroup D contains the covariate points at which the posterior probability of ∆(z) > δ is greater than 1 − α. The inclusive credible subgroup S includes the covariate points at which the posterior probability of ∆(z) ≤ δ is at most α.
6.1 Simulation study Table 12 represents the Bayesian credible subgroup by using pointwise method. It shows that log HR and RMST yield similar result, and the total coverage is mostly smaller than 80%. Moving from pointwise method to our proposed methods, there is increasing in credible pair size, specificity of D, but smaller sensitivity of D.

Prostate cancer dataset
The left panel in Fig 3 shows credible subgroups, for prostate cancer patients, using log HR and credible level of 95%. We used the same value δ H = 1 to define subgroups as in our manuscript. Similarly, the right panel shows credible subgroups using the RMSTd with a credible level of 95% and δ R = 0. For patients with or without existence of bone metastasis, the pointwise method provides tighter uncertainty region in both ∆ H and ∆ Rd than our proposed method does. As a result, the exclusive credible subgroup D from pointwise method is larger than our proposed method. In addition, the difference in RMSTs provides a larger exclusive subgroup D than a log HR method does.
Log HR

A large simulated clinical trial dataset
Fig 4 and Fig 5 shows credible subgroups using the log HR and difference in RMST, respectively. We used the same value settings to define subgroups as in our manuscript. The results show that the pointwise method provides a tighter uncertainty region in both ∆ H and ∆ Rd than our proposed method does. As a result, the exclusive credible subgroup D from pointwise method is larger than our proposed method.  7 Simulation study under nonproportional hazard assumption Table 13-15 diagnosis results for RMSTd under nonproportional hazard assumption. When the sample size is increasing, the total coverage, sensitivity and specificity of D increase, but credible pair size and MSE decrease. Moreover, the more conservative coverage has lower sensitivity of D but higher specificity of D.