Sine-G family of distributions in Bayesian survival modeling: A baseline hazard approach for proportional hazard regression with application to right-censored oncology datasets using R and STAN

Abdisalam Hassan Muse; Amani Almohaimeed; Hana N. Alqifari; Christophe Chesneau

doi:10.1371/journal.pone.0307410

Abstract

In medical research and clinical practice, Bayesian survival modeling is a powerful technique for assessing time-to-event data. It allows for the incorporation of prior knowledge about the model’s parameters and provides a more comprehensive understanding of the underlying hazard rate function. In this paper, we propose a Bayesian survival modeling strategy for proportional hazards regression models that employs the Sine-G family of distributions as baseline hazards. The Sine-G family contains flexible distributions that can capture a wide range of hazard forms, including increasing, decreasing, and bathtub-shaped hazards. In order to capture the underlying hazard rate function, we examine the flexibility and effectiveness of several distributions within the Sine-G family, such as the Gompertz, Lomax, Weibull, and exponentiated exponential distributions. The proposed approach is implemented using the R programming language and the STAN probabilistic programming framework. To evaluate the proposed approach, we use a right-censored survival dataset of gastric cancer patients, which allows for precise determination of the hazard rate function while accounting for censoring. The Watanabe Akaike information criterion and the leave-one-out information criterion are employed to evaluate the performance of various baseline hazards.

Citation: Muse AH, Almohaimeed A, Alqifari HN, Chesneau C (2025) Sine-G family of distributions in Bayesian survival modeling: A baseline hazard approach for proportional hazard regression with application to right-censored oncology datasets using R and STAN. PLoS ONE 20(3): e0307410. https://doi.org/10.1371/journal.pone.0307410

Editor: Oluwafemi Samson Balogun, University of Eastern Finland, FINLAND

Received: November 28, 2023; Accepted: July 4, 2024; Published: March 13, 2025

Copyright: © 2025 Muse et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: In this analysis, we examine the gastric cancer data collected from the Gastrointestinal Tumor Study Group (1982). This dataset has been widely utilized in studies focusing on crossing survival curves, particularly in the field of survival analysis. Some notable studies that have utilized this dataset include Demarqui and Mayrink, Muse et al., and Diao et al. The dataset, labeled "gastric," is freely accessible through the R package AmoudSurv. The clinical trial in oncology comprises 90 patients diagnosed with locally advanced gastric cancer. These patients were randomly divided into two groups: (i) a control group consisting of 45 patients who received chemotherapy, and (ii) a treatment group consisting of 45 patients who underwent radiation therapy in addition to chemotherapy. The study followed these patients for a duration of approximately 5 years. Each patient’s data in the dataset includes three variables: the response time, which indicates either failure (time to death) or right censoring; a binary failure indicator identifying patients who experienced the event of interest; and a binary group indicator (1) indicating the type of treatment received. The dataset is also available to the public for download from https://doi.org/10.1002/1097-0142(19820501)49:9<1771::AID-CNCR2820490907>3.0.CO;2-M.

Funding: Deanship of Scientific Research, Qassim University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: CDF, cumulative density function; CHRF, cumulative hazard rate function; ELPD, expected log predictive density; HMC, Hamiltonian Monte Carlo; HRF, hazard rate function; LOOIC, leave-one-out information criterion; PDF, probability density function; PH, proportional hazard; SEE-PH, sine-exponentiated exponential proportional hazard; SEE, sine-exponentiated exponential; SF, survival function; SG-PH, sine-Gompertz proportional hazard; SG, sine-Gompertz; SL-PH, sine-Lomax proportional hazard; SL, sine-Lomax; SW-PH, sine-Weibull proportional hazard; SW, sine-Weibull; WAIC, Watanabe Akaike information criterion

1 Introduction

1.1 Background of the study

In survival analysis, it is important to accurately model the underlying hazard rate function (HRF) in order to estimate individual survival probabilities and arrive at informed treatment decisions. Nevertheless, the complex structure of real-world survival data is often challenging to accurately represent with traditional survival analysis techniques. A common option for modeling survival data is the proportional hazard (PH) regression model, which enables the addition of factors that might have an impact on the HRF. According to [1, 2], the PH model’s popularity comes from its simplicity in handling technical problems like censoring and truncation, which are caused by the HRF’s appealing interpretation as a risk that varies with time. The baseline HRF, according to the traditional PH model, is assumed to follow a certain distribution, such as the Weibull, Exponential, or Gompertz distribution.

On the other hand, Bayesian survival modeling is an effective method for predicting the time to an event such as death, illness development, or failure in the system. It produces a more accurate model and meaningful conclusions by allowing the model to incorporate uncertainty and prior knowledge. [3] pointed out that using classical models to perform survival analysis may result in inaccurate conclusions when there are insufficient events or when the effective sample size is small; in such cases, Bayesian analysis can yield more reliable results.

In recent years, distribution theory has been one of the most active branches of probability and statistics. In particular, researchers have proposed several forms of statistical distributions based on trigonometrical transformation to model data and thus discover the best-suited distribution in statistics theory and practice. [4] reviewed and compared the latest improvements and contributions made through the various families of trigonometric functions in distribution and their application in data modeling, as well as their properties. As a matter of fact, the main trigonometric family is the sine-generated (Sine-G) family introduced in [5], which demonstrates how the sine function can be used quite efficiently to modify or enhance the modeling power of a baseline distribution. More investigations have been conducted in this direction. For instance, [6] proposed a new family based on a sine transformation known as the new Sine-G family, which has been demonstrated to have significant features. Furthermore, [7] demonstrated that the transformed sine-Weibull (TSW) distribution, which belongs to the Sine-G family, holds great potential for lifetime data analysis and modeling. Using a composition scheme, [8] introduced another extension of the Sine-G family called the Sine Kumaraswamy-G family and demonstrated its potential and robustness. The exponentiated Sine-G family was proposed by [9] and has also important properties for lifetime studies. By combining a trigonometric function with the Marshall-Olkin approach, [10] developed the Marshall-Olkin Sine-G (MOS-G) family.

By investigating the use of trigonometric distributions, specifically the Sine-G family, as baseline hazards for PH regression models, this study aims to introduce a new approach in survival modeling. Trigonometric distributions have specific properties that make them useful for modeling complex HRFs in challenging scenarios, such as right-censored oncology datasets. The robustness of the Bayesian technique combined with the flexibility and performance of the Sine-G family will also improve decision-making, survival predictions, and the study of medical research and clinical practice.

The effectiveness of statistical models, particularly survival models, is frequently evaluated using the Watanabe-Akaike information criterion (WAIC) [11] and the leave-one-out information criterion (LOOIC) [12]. WAIC and LOOIC are both completely Bayesian information criteria. Therefore, they consider the full posterior distribution of model parameters. As a result, they are more reliable than other information criteria that rely on asymptotic approximations, including the Bayesian information criterion (BIC), deviance information criterion (DIC) and Akaike information criterion (AIC). See [13, 14] for further information. In the context of the Sine-G family of survival distributions, WAIC and LOOIC can be used to assess the performance of different baseline hazards. The hazard rate for a standard individual, or an individual without any variables, is known as the baseline hazard. To find the baseline hazard that most accurately predicts the data and best fits the data, we can take into account the WAIC and LOOIC. The best-fitting model would be the one with the lowest WAIC or LOOIC value.

1.2 Motivation statement

The motivation for this research paper stems from the limitations of traditional survival analysis techniques in capturing complex hazard rate patterns exhibited in real-world datasets. Existing approaches, such as the use of exponential, Weibull, Lomax, and Gompertz distributions as baseline hazards in proportional hazards (PH) regression, often fail to adequately model non-monotonic hazard rates, including unimodal and bathtub shapes [15–21].

To address this issue, we propose a novel solution by introducing the Sine-G family of distributions as baseline hazards within a Bayesian survival modeling framework. The main motivation behind this approach is the inherent flexibility of the Sine-G family, which can capture a wider range of hazard rate patterns without the need for additional parameters. Our proposed models are capable of accurately modeling both monotonic and non-monotonic hazard rates without adding extra parameters to the baseline distribution or complicating the estimation of the parameters.

By incorporating these trigonometric distributions into survival regression models, we aim to provide a more accurate and comprehensive representation of survival times, particularly in right-censored gastric oncology datasets. The benefits of our proposed models are twofold. Firstly, they offer a more flexible and robust method for capturing complex hazard rate patterns, allowing researchers and practitioners to accurately model a wider range of real-world scenarios. Secondly, our models achieve this enhanced flexibility without the need for additional parameters, simplifying the estimation process and ensuring ease of implementation.

Our research contributes to advancing the field of survival analysis by enhancing the modeling options available for researchers and practitioners. By addressing the limitations of traditional approaches and offering a more comprehensive representation of survival times, our proposed models lead to improved insights and predictions in the study of time-to-event outcomes. Furthermore, the simplicity and effectiveness of our approach make it a valuable tool for various applications in the medical field and beyond.

1.3 Novelty of the study

To the best of the authors’ knowledge, this paper represents the first of its kind in incorporating trigonometric families of distributions as baseline hazards within parametric survival regression models. This novel approach challenges the conventional assumptions and opens up new avenues for understanding the underlying relationships in survival data. Furthermore, this study also pioneers the utilization of a Bayesian approach, specifically using the Stan language, to estimate the parameters and conduct inference in the proposed models. By leveraging the flexibility of Bayesian methods, we can account for uncertainty, incorporate prior knowledge, and obtain more robust and interpretable results.

1.4 Main contributions

The main contributions of this paper can be summarized as follows:

Introducing the Sine-G family of distributions as baseline hazards: By considering trigonometric families of distributions, we provide a new perspective for modeling survival data, potentially capturing complex patterns that may not be adequately represented by traditional distributions.
Developing a baseline hazard approach for PH regression models: We present a methodology to incorporate the Sine-G family of distributions as baseline hazards within parametric survival regression models, allowing for more flexible and tailored modeling of survival times.
Proposing a Bayesian framework using Stan: We demonstrate the applicability of a Bayesian approach for parameter estimation and inference in the proposed models. The use of Stan language enables efficient computation, uncertainty quantification, and the integration of prior knowledge.
Application to right-censored gastric oncology datasets: We apply the proposed methodology to real-world gastric oncology datasets with right-censored observations, showcasing its effectiveness in capturing the underlying survival patterns and providing valuable insights for clinical research and decision-making.

Overall, this research contributes to the advancement of survival analysis by introducing novel modeling techniques, leveraging Bayesian inference, and providing a comprehensive analysis of survival times in the context of gastric oncology.

1.5 Outline of the paper

This paper presents a comprehensive study on Bayesian survival modeling using the Sine-G family of distributions as baseline hazards for PH regression models. Section 2 discusses the functional basis of the Sine-G family, the different baseline hazards considered, and the PH regression model. The Bayesian methodology for the proposed framework is presented in Section 3. Using a lung cancer right-censored survival dataset, we assess the effectiveness of different baseline hazards for the Sine-G family in Section 4. To select the best-fitting distribution for capturing the dataset’s complex survival dynamics, we apply WAIC and LOOIC. Section 5 summarizes the findings, comparison, and evaluation of the different baseline hazards. Finally, in Sections 6 and 7, we present conclusions, recommendations, and future study areas, underscoring the value of the proposed approach in improving survival predictions and developing decision-making in medical research and clinical practice.

2 Sine-G distributions family

2.1 Main functions

Modeling a wide variety of HRFs is possible with the flexible distributions of the Sine-G family. This was shown in [5, 8, 22], and the related references mentioned in the introductory section. With respect to time t, let us now define the various possible structures using the HRF, survival function (SF), cumulative HRF (CHRF), and odds function. To begin, the CDF associated with the Sine-G family is indicated as (1) which is the integral of a basic cosine probability density function (PDF), upon composition with a CDF of the baseline distribution denoted G(t; θ). The “θ” symbolizes all the parameters on which the baseline distribution depends; it can be considered a vector in the article. After a simple integration, Eq (1) can be written as (2)

If the PDF associated with G(t; θ) is denoted as g(t; θ), the PDF of the family is given by (3)

Directly, the SF is given by (4)

Based on the SF and PDF, the HRF is specified by the following ratio function: (5)

The CHRF is given by (6)

To end this part, the odds function is given by the following ratio function: (7)

Several baseline distributions will be considered in our findings, and the associated sine distribution will be investigated.

2.2 Exponential baseline distribution

To begin, suppose the duration of survival, say modeled by a random variable T to fix the notation, is governed by the exponential distribution characterized by a scale parameter (λ). In this context, the CDF and PDF of the exponential distribution can be expressed as (8) and (9) respectively. It is understood that G(t;λ) = g(t;λ) = 0 for t < 0. We will omit such complementary value functions in the rest of the study for the sake of simplicity in exposition.

The sine-exponential (SE) distribution is derived by incorporating the exponential distribution as the baseline distribution, defined by G(t; θ), into the Sine-G family. Consequently, the CDF, PDF, SF, CHRF, HRF, and odds function of the SE distribution can be formulated as follows: (10) (11) (12) (13) (14) and (15) respectively.

2.3 Weibull baseline distribution

We now consider a more general scenario than the exponential baseline distribution. Suppose the duration of survival is governed by the Weibull distribution characterized by two parameters: a scale parameter (λ) and a shape parameter (α). In this context, the CDF, PDF, and HRF of the Weibull distribution can be written as follows: (16) and (17) respectively.

In a similar way than the SE distribution, the sine-Weibull (SW) distribution is derived by incorporating the Weibull distribution as the baseline distribution, defined by G(t; θ), into the Sine-G family. Consequently, the CDF, PDF, SF, CHRF, HRF, and odds function of the SW distribution can be specified as follows: (18) (19) (20) (21) (22) and (23) respectively.

2.4 Lomax baseline distribution

A polynomial decay baseline distribution is now considered. Suppose the duration of survival is governed by the Lomax distribution characterized by two parameters: a scale parameter (λ) and a shape parameter (α). In this context, the CDF and PDF of the Lomax distribution can be expressed as follows: (24) and (25) respectively.

As for the previous sine-generated distributions, the sine-Lomax (SL) distribution is derived by incorporating the Lomax distribution as the baseline distribution, defined by G(t; θ), into the Sine-G family. Consequently, the CDF, PDF, SF, CHRF, HRF, and odds function of the SL distribution can be indicated as follows: (26) (27) (28) (29) (30) and (31) respectively.

2.5 Exponentiated exponential baseline distribution

We now examine another extension of the exponential distribution. Suppose the duration of survival is governed by the exponentiated exponential distribution characterized by two parameters: a scale parameter (λ) and a shape parameter (α). In this context, the CDF and PDF of the exponentiated exponential distribution can be expressed as follows: (32) and (33) respectively.

The sine-exponentiated exponential (SEE) distribution is derived by incorporating the exponentiated exponential distribution as the baseline distribution, defined by G(t; θ), into the Sine-G family. Consequently, the CDF, PDF, SF, CHRF, HRF, and odds function of the SEE distribution can be written as follows: (34) (35) (36) (37) (38) and (39) respectively.

2.6 Gompertz baseline distribution

Another baseline is now considered, based on the Gompertz distribution. Thus, suppose the duration of survival is governed by this distribution characterized by two parameters: a scale parameter (λ) and a shape parameter (α). In this setting, the CDF, PDF, and HRF of the Gompertz distribution can be written as follows: (40) and (41) respectively.

The sine-Gompertz (SG) distribution is derived by incorporating the Gompertz distribution as the baseline distribution, defined by G(t; θ), into the Sine-G family. Consequently, the CDF, PDF, SF, CHRF, HRF, and odds function of the SG distribution can be expressed as follows: (42) (43) (44) (45) (46) and (47) respectively.

2.7 Shapes of HRF for one of the proposed baseline distribution

In this sub-section, we showcased various shapes for probability density function (PDF) as shown in Fig 1, and HRF shape as shown in Fig 2. The HRF shapes encompass six distinct HRF patterns, namely constant, increasing, decreasing, unimodal, bathtub, and modified bathtub shapes as presented in Fig 3.

Download:

Fig 1. PDF shapes for the SW baseline distribution.

https://doi.org/10.1371/journal.pone.0307410.g001

Download:

Fig 2. HRF shapes for the SW baseline distribution.

https://doi.org/10.1371/journal.pone.0307410.g002

Download:

Fig 3. Six patterns of the hrf shapes for the SW baseline distribution.

https://doi.org/10.1371/journal.pone.0307410.g003

3 PH model

We are now in a position to introduce the proposed PH model and also those derived from the above sine-generated distributions.

3.1 Formulation

The Cox PH model, which is based on hazard rates and was introduced by Cox in 1972 [23], is a widely recognized regression model in survival analysis [24–27]. In this model, the HRF is multiplicatively influenced by covariates. Various researchers have conducted studies on the parametric PH model using different baseline distributions and inferential techniques. Rezaei et al. [28] developed and evaluated an extended exponential geometric baseline distribution for the parametric PH model. Khan and Khosa [15] introduced a parametric PH model with a generalized log-logistic (GLL) baseline distribution. Balakrishnan et al. [29] developed an extension of the PH model and a reversed PH model using the Marshall-Olkin baseline hazard. Muse et al. [30] investigated the Bayesian approaches of the PH model with a GLL baseline hazard.

The PH regression model’s HRF, SF, and the CHRF can be expressed as follows: (48) (49) and (50) where h₀(t; θ), S₀(t; θ), and H₀(t; θ) are the baseline hazard rate, survival and cumulative hazard rate functions, θ symbolizes all the parameters on which the baseline hazard depends, x′ = (x₁, x₂, …, x_n) denotes the vector of the covariates in the survival model and β is the vector of unknown regression coefficients, excluding an intercept term to ensure model identifiability.

3.1.1 Likelihood function for the PH model.

Assuming we possess a random sample of size n and we observe (t_i, δ_i, x_i), i = 1, …, n, where t_i represents the observed lifetime for the i-th individual, δ_i denotes the censoring status (1 if the event of interest has occurred, 0 otherwise), and x_i represents the explanatory variables. Regarding the PH model (48), the likelihood function can be expressed as follows: (51) where D = (t_i, δ_i, x_i, i = 1, 2, …, n), which can be described as a matrix with dimensions (3 × n), where each column represents the specific components of the event data for individual i. The first row of D corresponds to the observed time t_i, the second row represents the event indicator δ_i (with 1 indicating an event occurrence and 0 for censored data), and the third row denotes the set of covariates or explanatory variables x_i associated with the individual. More precisely, we can write

The natural logarithm of the likelihood function, commonly known as the log-likelihood function, is given by (52)

In this equation, we recall that h₀(t; θ) and H₀(t; θ) represent the baseline HRF and CHRF for the proposed baseline distributions derived from the Sine-G family, respectively. The complete log-likelihood function of the Sin-G PH regression model can be easily expressed using this formulation.

The maximum likelihood estimates (MLEs) of the parameters are determined by maximizing this function. They can be obtained through an iterative optimization process, such as the Newton-Raphson algorithm. The MLEs tend to approach normality, enabling hypothesis testing and interval estimations of model parameters.

3.1.2 Survival data generation from the PH regression models.

In our study, we used the inversion approach to generate lifetime data from the PH regression model. This particular technique relies on the connection between the CHRF of a survival random variable and a standard uniform random variable. Whenever the CHRF can be expressed in a closed-form solution, this method can be applied, inverted, and conveniently implemented using the R software [31].

In this study, we adopted the approach proposed by Bender et al. [32] for simulating data from the Cox PH model. Additional details can be found in references [33, 34].

The CDF is derived from the survival function using the following general equation: (53) where t represents the survival time and x denotes the covariates. Consequently, if Y is a random variable following the CDF F, then U = F(Y) follows a uniform distribution on the interval [0, 1], and (1 − U) also follows the same distribution. It can be observed that (54) which implies (55)

Considering CHRF for the PH model in Eq (50), we obtain (56)

The generation of life times for the proposed PH regression model is as follows:

If the baseline HRF is strictly positive for all t, then the baseline CHRF can be inverted, allowing us to express the lifetime data for each of the PH regression models considered in this study.

Here, we used the baseline CHRF and its inverse function, i.e., H₀(t; θ) and , to generate the lifetime data.

3.2 Proposed PH models

3.2.1 Sine-Exponential PH model.

Assume that a random variable T has a SE baseline hazard with parameter λ. Consequently, the HRF and CHRF with covariate variable vector x are as follows: (57) and (58) respectively.

3.2.2 Sine-Weibull PH model.

Assume T has a SW baseline hazard with parameters λ and α. Consequently, the HRF and CHRF with covariate variable vector x are as follows: (59) and (60) respectively.

3.2.3 Sine-Lomax PH model.

Assume T has a SL baseline hazard with parameters λ and α. Consequently, the HRF and CHRF with covariate variable vector x are as follows: (61) and (62) respectively.

3.2.4 Sine-exponentiated exponential PH model.

Assume T has a SEE baseline hazard with parameters λ and α. Consequently, the HRF and CHRF with covariate variable vector x are as follows: (63) and (64) respectively.

3.2.5 Sine-Gompertz PH model.

Assume T has a SG baseline hazard with parameters λ and α. Consequently, the HRF and CHRF with covariate variable vector x and t ≥ 0, are respectively as follows: (65) and (66)

4 Bayesian analysis

4.1 Prior specification

In this subsection, we present general guidelines for selecting priors for regression coefficients associated with explanatory variables and baseline hazard parameters. We examined the prior independence scenario between the baseline parameters in h₀(t) (baseline hazard) and the regression coefficients. Furthermore, we assumed prior independence of the regression coefficients in a non-informative setting by using normal distributions with zero mean and a large known variance [2, 35], given by (67) where represents a normal distribution with zero mean, variance of , and regression coefficient β_j for the j-th covariate where j = 1, 2, …, J, π(h₀) signifies the prior distribution of all baseline parameters and hyperparameters in h₀(t).

For the baseline hazard parameter θ in baseline distributions, say θ = (λ, α), we employ gamma distributions as prior distributions for the non-regression coefficient parameters. The choice of gamma distributions is based on their flexibility, as they allow for the inclusion of non-informative priors, such as the uniform distribution. For the regression coefficients, we set a prior scenario using a non-informative independent gamma distribution with parameters Gamma(10,10) as the baseline distribution. This choice is made because we lack any prior information from historical data or previous experiments. This selection is motivated by the fact that these priors have been widely considered in numerous study publications in the literature, including references [36–40]. (68) and (69)

The hyperparameter values of the prior distributions are selected based on historical data of the baseline distribution [36].

For the prior of regression coefficients, we have (70) where β′ = (β₁, β₂, …, β_J) represents a vector of regression coefficients associated with the covariates in the HRF of the survival model.

The joint prior distribution for the baseline hazard parameters and regression coefficients is expressed as follows: (71)

The model requires data D = (t_i, δ_i, x_i), i = 1, …, n, where t_i represents the observed lifetime time for the i-th individual, δ_i denotes the censoring status (1 if the event of interest has occurred, 0 otherwise), and x_i represents the explanatory variables.

4.2 Posterior specification

Bayes’ theorem is employed to combine prior knowledge and experimental data in the posterior distribution as follows: (72) where L(h₀, β) represents the likelihood function of (h₀, β) as represented in Eq (51).

In this study, Bayesian inference is performed using the Stan software, which employs Hamiltonian Monte Carlo (HMC) algorithms. The Stan framework utilizes the No-U-Turn Sampler (NUTS) algorithm, an advanced variant of HMC, to efficiently sample from the posterior distribution (Hoffman and Gelman, 2014). The NUTS algorithm automatically tunes its parameters, such as the step size and trajectory length, during the sampling process to improve sampling efficiency [37, 41].

To update each parameter component, Stan employs a combination of HMC and the Gibbs sampling algorithm. This allows for efficient exploration of the posterior distribution while maintaining the conditional independence structure of the model. By incorporating the Gibbs sampling algorithm within the Stan framework, we can efficiently sample from the joint posterior distribution of the parameters [2].

4.3 McMC simulation

Calculating the marginal posterior distribution is difficult due to the complex high-dimensional integral involved. It is impossible to obtain accurate marginal distributions and the normalized joint posterior distribution analytically. Therefore, we employ the McMC simulation method, specifically utilizing the HMC (Hamiltonian Monte Carlo) method and its adaptive NUTS (No-U-Turn sampler) algorithm, to approximate these integrals. We present the implementation and configuration details of these methods. With the assistance of STAN, we use the McMC simulation method to carry out the estimation procedure and uncover significant discoveries [42, 43].

4.3.1 HMC algorithm.

The Hamiltonian Monte Carlo (HMC) algorithm is a powerful method for efficiently exploring the posterior distribution by utilizing derivatives of the target density function. It employs a simulation of Hamiltonian dynamics through numerical integration, followed by a Metropolis acceptance step, to draw samples from the joint density function involving both the parameter θ and auxiliary momentum variables φ. In this sub-section, we present the HMC technique using the notation introduced by [44], aligning it with [43, 45, 46], with the goal of sampling from a density function denoted as p(θ), representing the parameter θ in Bayesian analysis, often expressed as a Stan program for the Bayesian posterior p(θ|x) conditioned on observed data X.

To begin, a multivariate normal distribution is commonly used as the auxiliary density, with φ being drawn from φ ∼ MultiNormal(0, ϵ). It is important to note that this auxiliary density is independent of the θ parameters. The parameter ϵ represents the Euclidean metric and acts as a measure of variability. By transforming the parameter space, this auxiliary density enables more effective and efficient sampling. In Stan, the inverse of ϵ (ϵ⁻¹) is typically set to a diagonal estimate of the covariance computed during the warm-up phase.

The joint density function p(φ, θ) defines the Hamiltonian, denoted as Q(φ, θ) = −log(p(φ, θ)) or equivalently Q(φ, θ) = −log(p(φ|θ)) − log(p(θ)), which can also be expressed as Q(φ, θ) = K(φ|θ) + P(θ). Here, K(φ|θ) represents the kinetic energy, and P(θ) represents the potential energy. In a Stan program, the log density is defined to describe the characteristics of the distribution based on the current parameter value θ.

The transition to a new state occurs in two stages before undergoing a Metropolis acceptance step. Firstly, the momentum φ is independently sampled as φ ∼ MultiNormal(0, ϵ) and is not carried over between iterations. Next, Hamilton’s equations are utilized to evolve the joint system, which includes both the current parameter values θ and the newly sampled momentum φ. The equations governing this evolution are given by: (73)

Since the momentum density is independent of the target density (i.e., p(φ|θ) = p(φ)), the first term in the time derivative of the momentum, , becomes zero, resulting in zero contributions to the pair of time derivatives.

Following previous implementations of Hamiltonian Monte Carlo (HMC), Stan employs the leapfrog integrator to solve this two-state differential equation. The leapfrog algorithm, a numerical technique designed for stable results in Hamiltonian systems, operates by taking discrete steps with a small time interval denoted as ϵ. The algorithm starts by independently sampling a new momentum term, φ ∼ MultiNormal(0, ϵ), without relying on the parameter values θ or the previous momentum value. Subsequently, the algorithm alternates between half-step updates of the momentum and full-step updates of the position: (74)

By accumulating a simulated time of Lϵ through L leapfrog steps, a final state (φ,θ⁾ is obtained. If the leapfrog integrator were perfect in terms of numerical accuracy, introducing randomness solely through generating a random momentum vector for each transition would be sufficient. However, to account for numerical integration errors, a Metropolis acceptance step is included. This step determines the probability of accepting the proposal (φ, θ) generated from transitioning from the current state (φ,θ⁾. The acceptance probability is calculated as: (75)

If the proposal is rejected, theprevious parameter value is saved and utilized to initiate the following iteration.

In summary, the HMC algorithm begins by initializing a preset parameter set θ, either provided by the user or randomly generated in the Stan framework. After a set number of iterations, a new momentum vector is sampled, and the leapfrog integrator updates the current parameter value θ. Leapfrog integration, based on Hamiltonian dynamics, is performed with a discretization time interval ϵ and a specified number of steps (L). Following this, an acceptance step, guided by the Metropolis criterion, determines whether to transition to the new state (φ,θ⁾ or maintain the current state.

4.4 Model specification

Suppose that observed data y₁, …, y_n, are modeled as independent given parameters Ψ. This implies that . Then assume we have a prior distribution P(Ψ), which results in a posterior distribution P(Ψ|Y) and a posterior predictive distribution . By employing Bayes’ theorem, we can express the joint or pointwise posterior density as follows: (76) where the posterior is proportional to the likelihood multiplied by the prior, where Ψ represents the model parameters and Y denotes the observed data.

4.5 The expected log predictive density

Consider data y₁, y₂, .., y_n, which is independent given parameters Ψ. Thus the likelihood can be decomposed into the following product of pointwise likelihoods:

Suppose a prior distribution P(Ψ) and posterior predictive distribution for new data , then we have

The expected log predictive density (ELPD) is given by (77) where is the log predictive density of the model for a new observation , that has been generated by some true, unknown process, . The ELPD is a measure of out-of-sample predictive performance for a model. It is defined as the average of the log predictive densities of all possible future data points, where the log predictive density is the log-likelihood of a data point under the model. The ELPD can be estimated using Bayesian methods such as LOOIC or WAIC, which estimate the predictive density for each data point using the posterior distribution of the model parameters and then averaging over the posterior distribution of the data-generating process.

4.6 Out-of-sample pointwise predictive accuracy estimation

For a new data point , the out-of-sample predictive fit is indicated as follows: (78) where represents the predictive density for , which is generated from the posterior distribution P_post(Ψ). Note that, we use the notation P_post to indicate the posterior distribution.

4.7 The Watanabe-Akaike Information Criteria (WAIC)

The WAIC is a Bayesian information criteria used to evaluate statistical models’ out-of-sample predictive accuracy [11, 13]. It is less sensitive to model overfitting and employs a more fully Bayesian approach than other information criteria, such as the Akaike information criterion (AIC). To construct the WAIC, we first compute the log pointwise posterior predictive density for each data point d_i. This is accomplished by averaging the log predictive density over the model parameters’ posterior estimates. The variance of the log predictive density is then calculated for every data point. In order to determine the effective number of parameters, we sum the variance across all the data points.

A model’s complexity is measured by its effective number of parameters. It refers to the number of model parameters that are utilized to accurately predict the data. It is less likely for a model to overfit the data if it has fewer effective parameters. After determining the effective number of parameters, we can compute WAIC using the following formula: (79) where the expectation E_post is an average of Ψ over its posterior distribution. The elpd_WAIC is thus obtained as the following difference: (80)

After that, we get the WAIC defined as (81)

The WAIC can be used to assess different models and determine which model is most likely to accurately predict the data. The optimal model is one with a lower WAIC value.

4.8 The Leave-One-Out Information Criteria (LOOIC)

The LOOIC is a model selection criterion that estimates the out-of-sample predictive performance of a statistical model. It is based on the idea of leaving out one observation at a time, fitting the model to the remaining data, and then using the fitted model to predict the left-out observation. This process is repeated for all observations in the dataset, and the average of the predicted log-likelihoods is used to assess the model’s out-of-sample predictive performance. The model with the lowest LOOIC is considered to be the best model.

The leave-one-out (LOO) technique is a method used to estimate the ELPD or generalization performance of a model. It involves training the model on all observations except for a particular observation y_i, and then predicting the held-out observation y_i. This process is repeated for each of the n observations, treating each observation y_i as a pseudo-Monte Carlo sample from the true generating model p_t. Consequently, we obtain n LOO posterior distributions, denoted as p(Ψ|y_−i), where y_−i represents the data with observation y_i removed [12].

Using the LOO posteriors, we can estimate the ELPD, as shown in Eq (77), using the following formula: (82) (83) where P(y_i|Ψ) represents the likelihood, and P(Ψ|y_−i) denotes the posterior distribution for Ψ when we exclude the observation y_i.

5 Practical applications

This research utilizes two datasets sourced from existing literature to showcase the effectiveness of the sin-G family of baseline distributions for modeling censored survival data. These two datasets exhibit contrasting hazard rate function (HRF) shapes, with one dataset displaying a monotonic HRF shape as shown in Fig 4 and the other featuring a non-monotonic HRF shape as shown in Fig 5.

Download:

Fig 4. TTT plot for the gastric cancer data set.

https://doi.org/10.1371/journal.pone.0307410.g004

Download:

Fig 5. TTT plot for the Alloauto data set.

https://doi.org/10.1371/journal.pone.0307410.g005

5.1 Data I

In this section, we analyze a dataset from a clinical trial in gastric oncology that contains right-censored data. Our objective is to demonstrate the application of the proposed parametric PH model to modeling lifetime data with right-censored datasets. Initially, we employ Bayesian analysis using the Rstan package to examine the PH model and its competing baseline hazards, such as the SW, SG, SL, and SEE distributions. Subsequently, we compare the models using two evaluation metrics: the WAIC and the LOOIC. Finally, we interpret the obtained results.

5.1.1 Data source and description.

In this analysis, we examine the gastric cancer data collected from the Gastrointestinal Tumor Study Group (1982). This dataset has been widely considered in studies focusing on crossing survival curves, particularly in the field of survival analysis. Some notable studies that have utilized this dataset include Demarqui and Mayrink [47], Muse et al. [17], and Diao et al. [48]. The dataset, labeled “gastric,” is freely accessible through the R package AmoudSurv [49].

The clinical trial in oncology comprises 90 patients diagnosed with locally advanced gastric cancer. These patients were randomly divided into two groups: (i) a control group consisting of 45 patients who received chemotherapy, and (ii) a treatment group consisting of 45 patients who underwent radiation therapy in addition to chemotherapy. The study followed these patients for a duration of approximately 5 years. Each patient’s data in the dataset includes three variables: the response time, which indicates either failure (time to death) or right censoring; a binary failure indicator identifying patients who experienced the event of interest; and a binary group indicator (1) indicating the type of treatment received.

5.1.2 Bayesian analysis application.

In this part, we employ the McMC samples of posterior properties for the proposed fully parametric PH models with the Sine-G baseline distributions.

5.1.3 Results.

Table 1 presents the results, allowing us to assess various posterior characteristics of interest and their corresponding numerical values.

Download:

Table 1. Results for the posterior properties of the SW-PH, SL-PH, SG-PH and SEE-PH models.

https://doi.org/10.1371/journal.pone.0307410.t001

Table 1 summarizes the posterior properties of four different models: SW-PH, SL-PH, SG-PH, and SEE-PH. It provides estimates, standard errors, quantiles, effective sample sizes (N_eff), and potential scale reduction factors () for each model’s parameters. The results indicate the estimated values and uncertainties of the parameters, while N_eff and provide information about the convergence of the Markov chain Monte Carlo algorithm. These findings are essential for understanding the posterior distributions and making inferences about the models’ parameters.

5.1.4 Convergence diagnostics.

To assess the convergence of the algorithm for the proposed regression models, we employed numerical and visual methods. The HMC-NUTS algorithm, utilized in the McMC procedure, has successfully converged to the joint posterior distribution, as demonstrated by the summary results provided in the table above. Several essential indicators support this conclusion: the potential scale reduction factor is 1, the effective sample size (n_eff) exceeds 400, and the Monte Carlo error (SE) is less than 0.05 of the posterior standard deviations for all parameters.

Visual assessment of convergence is commonly performed by examining kernel density, auto-correlation, and trace graphs [17, 45]. Figs 6–9 illustrate a stationary pattern fluctuating within a defined range, providing visual evidence of the McMC algorithm’s convergence. Additionally, Figs 10–13 displaying the auto-correlation plots, demonstrate a rapid decrease in auto-correlation as the lag period increases. This indicates satisfactory mixing and convergence of the algorithm towards the desired posterior distribution.

Download:

Fig 6. The trace plots of the posterior parameters for the SW-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g006

Download:

Fig 7. The trace plots of the posterior parameters for the SL-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g007

Download:

Fig 8. The trace plots of the posterior parameters for the SG-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g008

Download:

Fig 9. The trace plots of the posterior parameters for the SEE-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g009

Download:

Fig 10. The autocorrelation plots of the posterior parameters for the SW-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g010

Download:

Fig 11. The autocorrelation plots of the posterior parameters for the SL-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g011

Download:

Fig 12. The autocorrelation plots of the posterior parameters for the SG-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g012

Download:

Fig 13. The autocorrelation plots of the posterior parameters for the SEE-PH model using gastric cancer data.

https://doi.org/10.1371/journal.pone.0307410.g013

Table 2 presents the Bayesian model comparison results for four different models: SW-PH, SL-PH, SG-PH, and SEE-PH models. The comparison is based on two evaluation metrics: the WAIC and the LOOIC. The WAIC and LOOIC values provide a measure of the models’ goodness of fit and complexity. Lower values indicate better model performance.

Download:

Table 2. Bayesian model comparison for the SW-PH, SL-PH, SG-PH, and SEE-PH models.

https://doi.org/10.1371/journal.pone.0307410.t002

From the Table 3, we can observe the following:

The SL-PH model has the lowest WAIC and LOOIC values, indicating the best overall fit to the data among the four models.
The SW-PH model also shows relatively low WAIC and LOOIC values, suggesting a good fit to the data.
The SEE-PH model and SG-PH model have higher WAIC and LOOIC values compared to the other models, indicating poorer fit or higher complexity.

Download:

Table 3. Results for the posterior properties of the SW-PH, SL-PH, SG-PH and SEE-PH models using dataset II.

https://doi.org/10.1371/journal.pone.0307410.t003

Based on these results in Table 2, the SL-PH model appears to be the most suitable choice for modeling the data, followed closely by the SEE-PH model. However, further analysis and consideration of other factors related to the specific research context may be necessary to make a final model selection.

5.2 Data II

Klein and Moeschberger [50] conducted a research study that involved 101 patients diagnosed with advanced acute myelogenous leukemia. Out of these patients, 51 underwent an autologous (auto) bone marrow transplant, while 50 received an allogeneic (allo) transplant. For 28 patients who underwent auto transplant and 22 patients who had allo transplant, the survival times (measured in months) were censored. By closely examining the TTT plot depicted in Fig 5, there is a notable suggestion of the hazard function’s unimodal nature.

5.2.1 Results.

Table 3 presents the results, allowing us to assess various posterior characteristics of interest and their corresponding numerical values using dataset II.

Table 3 summarizes the posterior properties of four different models: SW-PH, SL-PH, SG-PH, and SEE-PH. It provides estimates, standard errors, quantiles, effective sample sizes (N_eff), and potential scale reduction factors () for each model’s parameters. The results indicate the estimated values and uncertainties of the parameters, while N_eff and provide information about the convergence of the Markov chain Monte Carlo algorithm. These findings are essential for understanding the posterior distributions and making inferences about the models’ parameters.

5.2.2 Convergence diagnostics.

Here we presented the trace plot for all the competitive models of our study as shown in Figs 14–17.

Download:

Fig 14. The trace plots of the posterior parameters for the SW-PH model using Alloauto data.

https://doi.org/10.1371/journal.pone.0307410.g014

Download:

Fig 15. The trace plots of the posterior parameters for the SL-PH model using Alloauto data.

https://doi.org/10.1371/journal.pone.0307410.g015

Download:

Fig 16. The trace plots of the posterior parameters for the SG-PH model using Alloauto data.

https://doi.org/10.1371/journal.pone.0307410.g016

Download:

Fig 17. The trace plots of the posterior parameters for the SEE-PH model using Alloauto data.

https://doi.org/10.1371/journal.pone.0307410.g017

5.2.3 Model comparison for Dataset II.

For the SW-PH model, the WAIC and LOOIC values are both 451.3. The SL-PH model has a slightly lower WAIC and LOOIC of 450.60. The SG-PH model has a higher WAIC and LOOIC of 481.70, while the SEE-PH model has a WAIC and LOOIC of 472.30.

Based on this comparison, the SL-PH model appears to have the best fit to the Alloauto dataset, as it has the lowest WAIC and LOOIC values among the models evaluated. For the SW-PH model, the WAIC and LOOIC values are both 451.3. The SL-PH model has a slightly lower WAIC and LOOIC of 450.60. The SG-PH model has a higher WAIC and LOOIC of 481.70, while the SEE-PH model has a WAIC and LOOIC of 472.30.

Based on this comparison, the SL-PH model appears to have the best fit to the Alloauto dataset, as it has the lowest WAIC and LOOIC values among the models evaluated as presented in Table 4.

Download:

Table 4. Bayesian model comparison for the SW-PH, SL-PH, SG-PH, and SEE-PH models using Alloauto dataset.

https://doi.org/10.1371/journal.pone.0307410.t004

6 Discussions and conclusions

Our primary objective in this study was to develop and evaluate a Bayesian survival modeling approach for PH regression models. Specifically, we focused on utilizing the Sine-G family of distributions as a baseline hazard family.

By incorporating the Sine-G family, which includes distributions such as the Weibull, exponentiated exponential, Lomax, and Gompertz distributions, we aimed to enhance the modeling capabilities of our approach. These distributions offer a flexible range of hazard forms, allowing us to capture various patterns observed in survival data.

Through rigorous evaluation and analysis, we compared the performance of different baseline hazards within the Sine-G family. Our goal was to identify the most suitable choice for our proposed Bayesian survival modeling approach.

Based on our findings, we determined that the SL-PH model demonstrated superior performance compared to the other baseline hazards considered. This model exhibited greater flexibility and accuracy in representing the HRF within proportional hazard regression models.

By utilizing the SL-PH model, we were able to effectively capture a wide range of hazard shapes, including increasing, decreasing, and bathtub-shaped hazards. This flexibility enabled a more comprehensive understanding of the underlying survival patterns and improved the accuracy of our predictions.

To implement our proposed approach, we used the R programming language and the STAN probabilistic programming framework. This combination ensured efficient computation and reliable inference of model parameters, thereby enhancing the reliability and applicability of our findings.

In summary, our study successfully developed and evaluated a Bayesian survival modeling approach for PH regression models using the Sine-G family of distributions as baseline hazards. The SL-PH model emerged as the preferred choice due to its superior flexibility and accuracy in representing hazard forms. Our findings contribute to advancing the field of survival analysis and provide a valuable tool for researchers and practitioners working with PH regression models in medical research and clinical practice.

7 Future work

In summary, future work in this area could involve exploring additional distributions within the Sine-G family, incorporating covariates into the proportional hazard regression models, comparing alternative approaches, validating the proposed approach using cross-validation or external datasets, and applying the methodology to other disease domains. These efforts will further enhance the flexibility, predictive power, and generalizability of Bayesian survival modeling, making it a valuable tool for analyzing time-to-event data in various medical research and clinical settings.

In summary, our future work will involve developing and extending the trigonometric probability distributions to parametric survival regression models, cure models, competing risk models, frailty models, multistate models, longitudinal and joint models, as well as spatial parametric survival models. We will incorporate covariates into the modeling framework, conduct comparative studies, validate the methodology using cross-validation or external datasets, and apply it to other disease domains. These efforts will further enhance the flexibility, predictive power, and generalizability of Bayesian survival modeling, making it a valuable tool for analyzing time-to-event data in various medical research and clinical settings.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0307410.s001

(DOCX)

Acknowledgments

The Researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2025).

References

1. Wienke A. Frailty models in survival analysis; 2007.
2. Muse AH, Mwalili S, Ngesa O, Chesneau C, Alshanbari HM, El-Bagoury AAH. Amoud class for hazard-based and odds-based regression models: Application to oncology studies. Axioms. 2022;11(11):606.
- View Article
- Google Scholar
3. Soodejani MT, Tabatabaei SM, Mahmoudimanesh M. Bayesian statistics versus classical statistics in survival analysis: an applicable example. American Journal of Cardiovascular Disease. 2021;11(4):484.
- View Article
- Google Scholar
4. Tomy L, Satish G. A review study on trigonometric transformations of statistical distributions. Biom Biostat Int J. 2021;10(4):130–136.
- View Article
- Google Scholar
5. Souza L, Junior W, De Brito C, Chesneau C, Ferreira T, Soares L. On the Sin-G class of distributions: theory, model and application. Journal of Mathematical Modeling. 2019;7(3):357–379.
- View Article
- Google Scholar
6. Mahmood Z, Chesneau C, Tahir MH. A new sine-G family of distributions: properties and applications. Bull Comput Appl Math. 2019;7(1):53–81.
- View Article
- Google Scholar
7. Jamal F, Chesneau C, Bouali DL, Ul Hassan M. Beyond the Sin-G family: The transformed Sin-G family. Plos one. 2021;16(5):e0250790. pmid:33974643
- View Article
- PubMed/NCBI
- Google Scholar
8. Chesneau C, Jamal F. The sine Kumaraswamy-G family of distributions. Journal of Mathematical Extension. 2020;15.
- View Article
- Google Scholar
9. Muhammad M, Alshanbari HM, Alanzi AR, Liu L, Sami W, Chesneau C, et al. A new generator of probability models: the exponentiated sine-G family for lifetime studies. Entropy. 2021;23(11):1394. pmid:34828091
- View Article
- PubMed/NCBI
- Google Scholar
10. Rajkumar J, Sakthivel K. A New Method of Generating Marshall–Olkin Sine–G Family and Its Applications in Survival Analysis. Lobachevskii Journal of Mathematics. 2022;43(2):463–472.
- View Article
- Google Scholar
11. Watanabe S, Opper M. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of machine learning research. 2010;11(12).
- View Article
- Google Scholar
12. MAGNUSSON M, et al. Leave-one-out cross-validation for Bayesian model comparison in large data. In: International conference on artificial intelligence and statistics. PMLR; 2020. p. 341–351.
13. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Statistics and computing. 2014;24(6):997–1016.
- View Article
- Google Scholar
14. Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing. 2017;27(5):1413–1432.
- View Article
- Google Scholar
15. Khan SA, Khosa SK. Generalized log-logistic proportional hazard model with applications in survival analysis. Journal of Statistical Distributions and Applications. 2016;3(1):1–18.
- View Article
- Google Scholar
16. Khan SA. Exponentiated Weibull regression for time-to-event data. Lifetime data analysis. 2018;24(2):328–354. pmid:28349290
- View Article
- PubMed/NCBI
- Google Scholar
17. Muse AH, Chesneau C, Ngesa O, Mwalili S. Flexible parametric accelerated hazard model: Simulation and application to censored lifetime data with crossing survival curves. Mathematical and Computational Applications. 2022;27(6):104.
- View Article
- Google Scholar
18. Al-Essa LA, Eliwa MS, El-Morshedy M, Alqifari H, Yousof HM. Flexible extension of the Lomax distribution for asymmetric data under different failure rate profiles: Characteristics with applications for failure modeling and service times for aircraft windshields. Processes. 2023;11(7):2197.
- View Article
- Google Scholar
19. Tashkandy YA, Almetwally EM, Ragab R, Gemeay AM, Abd El-Raouf M, Khosa SK, et al. Statistical inferences for the extended inverse Weibull distribution under progressive type-II censored sample with applications. Alexandria Engineering Journal. 2023;65:493–502.
- View Article
- Google Scholar
20. El-Morshedy M, Eliwa MS, Tahir MH, Alizadeh M, El-Desokey R, Al-Bossly A, et al. A Bivariate Extension to Exponentiated Inverse Flexible Weibull Distribution: Shock Model, Features, and Inference to Model Asymmetric Data. Symmetry. 2023;15(2):411.
- View Article
- Google Scholar
21. Muse AH, Tolba AH, Fayad E, Abu Ali OA, Nagy M, Yusuf M. Modelling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution. Computational Intelligence and Neuroscience. 2021;2021. pmid:34782836
- View Article
- PubMed/NCBI
- Google Scholar
22. Rubio FJ, Alvares D, Redondo-Sanchez D, Marcos-Gragera R, Sánchez MJ, Luque-Fernandez MA. Bayesian variable selection and survival modeling: assessing the Most important comorbidities that impact lung and colorectal cancer survival in Spain. BMC Medical Research Methodology. 2022;22(1):1–14. pmid:35369875
- View Article
- PubMed/NCBI
- Google Scholar
23. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological). 1972;34(2):187–202.
- View Article
- Google Scholar
24. Kumar D, Klefsjö B. Proportional hazards model: a review. Reliability Engineering & System Safety. 1994;44(2):177–188.
- View Article
- Google Scholar
25. Ewnetu WB, Gijbels I, Verhasselt A. Flexible hazard-based models and quantile regression for right-censored data using two-piece asymmetric distributions. 2022;.
26. Mastor ABS, Alghamdi AS, Ngesa O, Mung’atu J, Chesneau C, Afify AZ. The extended exponential-Weibull accelerated failure time model with application to Sudan COVID-19 Data. Mathematics. 2023;11(2):460.
- View Article
- Google Scholar
27. Irfan M, Usman M, Saidi S, Kurniasari D, et al. Survival analysis using cox proportional hazard regression approach in dengue hemorrhagic fever (DHF) case in Abdul Moeloek hospital Bandar Lampung in 2019. In: Journal of Physics: Conference Series. vol. 1751. IOP Publishing; 2021. p. 012011.
28. Rezaei S, Hashami S, Najjar L. Extended exponential geometric proportional hazard model. Annals of Data Science. 2014;1(2):173–189.
- View Article
- Google Scholar
29. Balakrishnan N, Barmalzan G, Haidari A. Modified proportional hazard rates and proportional reversed hazard rates models via Marshall-Olkin distribution and some stochastic comparisons. Journal of the Korean Statistical Society. 2018;47(1):127–138.
- View Article
- Google Scholar
30. Muse AH, Ngesa O, Mwalili S, Alshanbari HM, El-Bagoury AAH. A Flexible Bayesian Parametric Proportional Hazard Model: Simulation and Applications to Right-Censored Healthcare Data. Journal of Healthcare Engineering. 2022;2022. pmid:35693888
- View Article
- PubMed/NCBI
- Google Scholar
31. Muse AH. Bayesian and Frequentist Approaches for Flexible Parametric Hazard-Based Regression Models with Generalized Log-logistic Baseline Distribution: An Application to Right-Censored Oncology Data Sets. JKUAT-PAUSTI; 2024.
- View Article
- Google Scholar
32. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in medicine. 2005;24(11):1713–1723. pmid:15724232
- View Article
- PubMed/NCBI
- Google Scholar
33. Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in medicine. 2012;31(29):3946–3958. pmid:22763916
- View Article
- PubMed/NCBI
- Google Scholar
34. Muse AH, Mwalili S, Ngesa O, Chesneau C, Al-Bossly A, El-Morshedy M. Bayesian and Frequentist Approaches for a Tractable Parametric General Class of Hazard-Based Regression Models: An Application to Oncology Data. Mathematics. 2022;10(20):3813.
- View Article
- Google Scholar
35. Lázaro E, Armero C, Alvares D. Bayesian regularization for flexible baseline hazard functions in Cox survival models. Biometrical Journal. 2021;63(1):7–26. pmid:32885493
- View Article
- PubMed/NCBI
- Google Scholar
36. Muse AH, Mwalili S, Ngesa O, Almalki SJ, Abd-Elmougod GA. Bayesian and classical inference for the generalized log-logistic distribution with applications to survival data. Computational intelligence and neuroscience. 2021;2021. pmid:34671390
- View Article
- PubMed/NCBI
- Google Scholar
37. Al-Aziz SN, Muse AH, Jawad TM, Sayed-Ahmed N, Aldallal R, Yusuf M. Bayesian inference in a generalized log-logistic proportional hazards model for the analysis of competing risk data: An application to stem-cell transplanted patients data. Alexandria Engineering Journal. 2022;61(12):13035–13050.
- View Article
- Google Scholar
38. Alvares D, Rubio FJ. A tractable Bayesian joint model for longitudinal and survival data. Statistics in Medicine. 2021;40(19):4213–4229. pmid:34114254
- View Article
- PubMed/NCBI
- Google Scholar
39. Alvares D, Lázaro E, Gómez-Rubio V, Armero C. Bayesian survival analysis with BUGS. Statistics in Medicine. 2021;40(12):2975–3020. pmid:33713474
- View Article
- PubMed/NCBI
- Google Scholar
40. Alotaibi N, Al-Moisheer A, Elbatal I, Elgarhy M, Almetwally EM. Bayesian and Non-Bayesian Analysis for the Sine Generalized Linear Exponential Model under Progressively Censored Data;.
41. Ashraf-Ul-Alam M, Khan AA. Generalized Topp-Leone-Weibull AFT Modelling: A Bayesian Analysis with MCMC Tools Using R and Stan. Austrian Journal of Statistics. 2021;50(5):52–76.
- View Article
- Google Scholar
42. Abba B, Wang H. A new failure times model for one and two failure modes system: A Bayesian study with Hamiltonian Monte Carlo simulation. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability. 2024;238(2):304–323.
- View Article
- Google Scholar
43. Thach TT, Bris R. Improved new modified Weibull distribution: A Bayes study using Hamiltonian Monte Carlo simulation. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability. 2020;234(3):496–511.
- View Article
- Google Scholar
44. Betancourt M, Girolami M. Hamiltonian Monte Carlo for hierarchical models. Current trends in Bayesian methodology with applications. 2015;79(30):2–4.
- View Article
- Google Scholar
45. Abushal TA, Kumar J, Muse AH, Tolba AH. Estimation for Akshaya failure model with competing risks under progressive censoring scheme with analyzing of thymic lymphoma of mice application. Complexity. 2022;2022:1–27.
- View Article
- Google Scholar
46. Abba B, Wang H, Muhammad M, Bakouch HS. A robust bathtub-shaped failure time model for a two-component system with applications to complete and censored reliability data. Quality Technology & Quantitative Management. 2024;21(3):309–339.
- View Article
- Google Scholar
47. Demarqui FN, Mayrink VD. Yang and Prentice model with piecewise exponential baseline distribution for modeling lifetime data with crossing survival curves. Brazilian Journal of Probability and Statistics. 2021;35(1):172–186.
- View Article
- Google Scholar
48. Diao G, Zeng D, Yang S. Efficient semiparametric estimation of short-term and long-term hazard ratios with right-censored data. Biometrics. 2013;69(4):840–849. pmid:24328712
- View Article
- PubMed/NCBI
- Google Scholar
49. Muse AH, Mwalili S, Ngesa O, Chesneau C. ‘AmoudSurv:An R Package for Tractable Parametric Odds-Based Regression Models’; 2022. https://cran.r-project.org/web//packages/AmoudSurv/index.html.
50. Klein JP, Moeschberger ML, et al. Survival analysis: techniques for censored and truncated data. vol. 1230. Springer; 2003.

[ref1] 1. Wienke A. Frailty models in survival analysis; 2007.

[ref2] 2. Muse AH, Mwalili S, Ngesa O, Chesneau C, Alshanbari HM, El-Bagoury AAH. Amoud class for hazard-based and odds-based regression models: Application to oncology studies. Axioms. 2022;11(11):606.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Soodejani MT, Tabatabaei SM, Mahmoudimanesh M. Bayesian statistics versus classical statistics in survival analysis: an applicable example. American Journal of Cardiovascular Disease. 2021;11(4):484.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Tomy L, Satish G. A review study on trigonometric transformations of statistical distributions. Biom Biostat Int J. 2021;10(4):130–136.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Souza L, Junior W, De Brito C, Chesneau C, Ferreira T, Soares L. On the Sin-G class of distributions: theory, model and application. Journal of Mathematical Modeling. 2019;7(3):357–379.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Mahmood Z, Chesneau C, Tahir MH. A new sine-G family of distributions: properties and applications. Bull Comput Appl Math. 2019;7(1):53–81.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref7] 7. Jamal F, Chesneau C, Bouali DL, Ul Hassan M. Beyond the Sin-G family: The transformed Sin-G family. Plos one. 2021;16(5):e0250790. pmid:33974643
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref8] 8. Chesneau C, Jamal F. The sine Kumaraswamy-G family of distributions. Journal of Mathematical Extension. 2020;15.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Muhammad M, Alshanbari HM, Alanzi AR, Liu L, Sami W, Chesneau C, et al. A new generator of probability models: the exponentiated sine-G family for lifetime studies. Entropy. 2021;23(11):1394. pmid:34828091
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref10] 10. Rajkumar J, Sakthivel K. A New Method of Generating Marshall–Olkin Sine–G Family and Its Applications in Survival Analysis. Lobachevskii Journal of Mathematics. 2022;43(2):463–472.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Watanabe S, Opper M. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of machine learning research. 2010;11(12).
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. MAGNUSSON M, et al. Leave-one-out cross-validation for Bayesian model comparison in large data. In: International conference on artificial intelligence and statistics. PMLR; 2020. p. 341–351.

[ref13] 13. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Statistics and computing. 2014;24(6):997–1016.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing. 2017;27(5):1413–1432.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Khan SA, Khosa SK. Generalized log-logistic proportional hazard model with applications in survival analysis. Journal of Statistical Distributions and Applications. 2016;3(1):1–18.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. Khan SA. Exponentiated Weibull regression for time-to-event data. Lifetime data analysis. 2018;24(2):328–354. pmid:28349290
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref17] 17. Muse AH, Chesneau C, Ngesa O, Mwalili S. Flexible parametric accelerated hazard model: Simulation and application to censored lifetime data with crossing survival curves. Mathematical and Computational Applications. 2022;27(6):104.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref18] 18. Al-Essa LA, Eliwa MS, El-Morshedy M, Alqifari H, Yousof HM. Flexible extension of the Lomax distribution for asymmetric data under different failure rate profiles: Characteristics with applications for failure modeling and service times for aircraft windshields. Processes. 2023;11(7):2197.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref19] 19. Tashkandy YA, Almetwally EM, Ragab R, Gemeay AM, Abd El-Raouf M, Khosa SK, et al. Statistical inferences for the extended inverse Weibull distribution under progressive type-II censored sample with applications. Alexandria Engineering Journal. 2023;65:493–502.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref20] 20. El-Morshedy M, Eliwa MS, Tahir MH, Alizadeh M, El-Desokey R, Al-Bossly A, et al. A Bivariate Extension to Exponentiated Inverse Flexible Weibull Distribution: Shock Model, Features, and Inference to Model Asymmetric Data. Symmetry. 2023;15(2):411.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref21] 21. Muse AH, Tolba AH, Fayad E, Abu Ali OA, Nagy M, Yusuf M. Modelling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution. Computational Intelligence and Neuroscience. 2021;2021. pmid:34782836
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref22] 22. Rubio FJ, Alvares D, Redondo-Sanchez D, Marcos-Gragera R, Sánchez MJ, Luque-Fernandez MA. Bayesian variable selection and survival modeling: assessing the Most important comorbidities that impact lung and colorectal cancer survival in Spain. BMC Medical Research Methodology. 2022;22(1):1–14. pmid:35369875
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref23] 23. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological). 1972;34(2):187–202.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref24] 24. Kumar D, Klefsjö B. Proportional hazards model: a review. Reliability Engineering & System Safety. 1994;44(2):177–188.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref25] 25. Ewnetu WB, Gijbels I, Verhasselt A. Flexible hazard-based models and quantile regression for right-censored data using two-piece asymmetric distributions. 2022;.

[ref26] 26. Mastor ABS, Alghamdi AS, Ngesa O, Mung’atu J, Chesneau C, Afify AZ. The extended exponential-Weibull accelerated failure time model with application to Sudan COVID-19 Data. Mathematics. 2023;11(2):460.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref27] 27. Irfan M, Usman M, Saidi S, Kurniasari D, et al. Survival analysis using cox proportional hazard regression approach in dengue hemorrhagic fever (DHF) case in Abdul Moeloek hospital Bandar Lampung in 2019. In: Journal of Physics: Conference Series. vol. 1751. IOP Publishing; 2021. p. 012011.

[ref28] 28. Rezaei S, Hashami S, Najjar L. Extended exponential geometric proportional hazard model. Annals of Data Science. 2014;1(2):173–189.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref29] 29. Balakrishnan N, Barmalzan G, Haidari A. Modified proportional hazard rates and proportional reversed hazard rates models via Marshall-Olkin distribution and some stochastic comparisons. Journal of the Korean Statistical Society. 2018;47(1):127–138.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref30] 30. Muse AH, Ngesa O, Mwalili S, Alshanbari HM, El-Bagoury AAH. A Flexible Bayesian Parametric Proportional Hazard Model: Simulation and Applications to Right-Censored Healthcare Data. Journal of Healthcare Engineering. 2022;2022. pmid:35693888
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref31] 31. Muse AH. Bayesian and Frequentist Approaches for Flexible Parametric Hazard-Based Regression Models with Generalized Log-logistic Baseline Distribution: An Application to Right-Censored Oncology Data Sets. JKUAT-PAUSTI; 2024.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref32] 32. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in medicine. 2005;24(11):1713–1723. pmid:15724232
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref33] 33. Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in medicine. 2012;31(29):3946–3958. pmid:22763916
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref34] 34. Muse AH, Mwalili S, Ngesa O, Chesneau C, Al-Bossly A, El-Morshedy M. Bayesian and Frequentist Approaches for a Tractable Parametric General Class of Hazard-Based Regression Models: An Application to Oncology Data. Mathematics. 2022;10(20):3813.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. Lázaro E, Armero C, Alvares D. Bayesian regularization for flexible baseline hazard functions in Cox survival models. Biometrical Journal. 2021;63(1):7–26. pmid:32885493
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref36] 36. Muse AH, Mwalili S, Ngesa O, Almalki SJ, Abd-Elmougod GA. Bayesian and classical inference for the generalized log-logistic distribution with applications to survival data. Computational intelligence and neuroscience. 2021;2021. pmid:34671390
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref37] 37. Al-Aziz SN, Muse AH, Jawad TM, Sayed-Ahmed N, Aldallal R, Yusuf M. Bayesian inference in a generalized log-logistic proportional hazards model for the analysis of competing risk data: An application to stem-cell transplanted patients data. Alexandria Engineering Journal. 2022;61(12):13035–13050.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref38] 38. Alvares D, Rubio FJ. A tractable Bayesian joint model for longitudinal and survival data. Statistics in Medicine. 2021;40(19):4213–4229. pmid:34114254
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref39] 39. Alvares D, Lázaro E, Gómez-Rubio V, Armero C. Bayesian survival analysis with BUGS. Statistics in Medicine. 2021;40(12):2975–3020. pmid:33713474
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref40] 40. Alotaibi N, Al-Moisheer A, Elbatal I, Elgarhy M, Almetwally EM. Bayesian and Non-Bayesian Analysis for the Sine Generalized Linear Exponential Model under Progressively Censored Data;.

[ref41] 41. Ashraf-Ul-Alam M, Khan AA. Generalized Topp-Leone-Weibull AFT Modelling: A Bayesian Analysis with MCMC Tools Using R and Stan. Austrian Journal of Statistics. 2021;50(5):52–76.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref42] 42. Abba B, Wang H. A new failure times model for one and two failure modes system: A Bayesian study with Hamiltonian Monte Carlo simulation. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability. 2024;238(2):304–323.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref43] 43. Thach TT, Bris R. Improved new modified Weibull distribution: A Bayes study using Hamiltonian Monte Carlo simulation. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability. 2020;234(3):496–511.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref44] 44. Betancourt M, Girolami M. Hamiltonian Monte Carlo for hierarchical models. Current trends in Bayesian methodology with applications. 2015;79(30):2–4.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref45] 45. Abushal TA, Kumar J, Muse AH, Tolba AH. Estimation for Akshaya failure model with competing risks under progressive censoring scheme with analyzing of thymic lymphoma of mice application. Complexity. 2022;2022:1–27.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref46] 46. Abba B, Wang H, Muhammad M, Bakouch HS. A robust bathtub-shaped failure time model for a two-component system with applications to complete and censored reliability data. Quality Technology & Quantitative Management. 2024;21(3):309–339.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref47] 47. Demarqui FN, Mayrink VD. Yang and Prentice model with piecewise exponential baseline distribution for modeling lifetime data with crossing survival curves. Brazilian Journal of Probability and Statistics. 2021;35(1):172–186.
View Article
Google Scholar

[142] View Article

[143] Google Scholar

[ref48] 48. Diao G, Zeng D, Yang S. Efficient semiparametric estimation of short-term and long-term hazard ratios with right-censored data. Biometrics. 2013;69(4):840–849. pmid:24328712
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref49] 49. Muse AH, Mwalili S, Ngesa O, Chesneau C. ‘AmoudSurv:An R Package for Tractable Parametric Odds-Based Regression Models’; 2022. https://cran.r-project.org/web//packages/AmoudSurv/index.html.

[ref50] 50. Klein JP, Moeschberger ML, et al. Survival analysis: techniques for censored and truncated data. vol. 1230. Springer; 2003.

Figures

Abstract

1 Introduction

1.1 Background of the study

1.2 Motivation statement

1.3 Novelty of the study

1.4 Main contributions

1.5 Outline of the paper

2 Sine-G distributions family

2.1 Main functions

2.2 Exponential baseline distribution

2.3 Weibull baseline distribution

2.4 Lomax baseline distribution

2.5 Exponentiated exponential baseline distribution

2.6 Gompertz baseline distribution

2.7 Shapes of HRF for one of the proposed baseline distribution

3 PH model

3.1 Formulation

3.1.1 Likelihood function for the PH model.

3.1.2 Survival data generation from the PH regression models.

3.2 Proposed PH models

3.2.1 Sine-Exponential PH model.

3.2.2 Sine-Weibull PH model.

3.2.3 Sine-Lomax PH model.

3.2.4 Sine-exponentiated exponential PH model.

3.2.5 Sine-Gompertz PH model.

4 Bayesian analysis

4.1 Prior specification

4.2 Posterior specification

4.3 McMC simulation

4.3.1 HMC algorithm.

4.4 Model specification

4.5 The expected log predictive density

4.6 Out-of-sample pointwise predictive accuracy estimation

4.7 The Watanabe-Akaike Information Criteria (WAIC)

4.8 The Leave-One-Out Information Criteria (LOOIC)

5 Practical applications

5.1 Data I

5.1.1 Data source and description.

5.1.2 Bayesian analysis application.

5.1.3 Results.

5.1.4 Convergence diagnostics.

5.2 Data II

5.2.1 Results.

5.2.2 Convergence diagnostics.

5.2.3 Model comparison for Dataset II.

6 Discussions and conclusions

7 Future work

Supporting information

S1 File.

Acknowledgments

References