Survival analysis of cancer patients using a new extended Weibull distribution

One of the most important applications of statistical analysis is in health research and applications. Cancer studies are mostly required special statistical considerations in order to find the appropriate model for fitting the survival data. Existing classical distributions rarely fit such data well and an increasing interest has been shown recently in developing more flexible distributions by introducing some additional parameters to the basic model. In this paper, a new five-parameters distribution referred as alpha power Kumaraswamy Weibull distribution is introduced and studied. Particularly, this distribution extends the Weibull distribution based on a novel technique that combines two well known generalisation methods, namely, alpha power and T-X transformations. Different characteristics of the proposed distribution, including moments, quantiles, Rényi entropy and order statistics are obtained. The method of maximum likelihood is applied in order to estimate the model parameters based on complete and censored data. The performance of these estimators are examined via conducting some simulation studies. The potential importance and applicability of the proposed distribution is illustrated empirically by means of six datasets that describe the survival of some cancer patients. The results of the analysis indicated to the promising performance of the alpha power Kumaraswamy Weibull distribution in practice comparing to some other competing distributions.


Introduction
Many statistical distributions have been extensively utilized for analysing time to event data also referred to as survival or reliability data, in different areas of applicability, including the medical field. Medical scientists are mostly interested in studying the survival of patients with cancer in the applied research. These research are most often require special statistical attentions and adjustments in the context of finding and choosing the appropriate model that accurately determine and estimate the survival data and yielded in reliable results and valid inferences. It is possible to consider the Weibull distribution [1], to be one of the most popular distributions for modeling such data that explain the mortality and failure. However, the classical two-parameter Weibull distribution is less suitable for fitting when data show non-monotonic failure rates due to its limitation in modeling only monotonically increasing and decreasing hazard functions. Therefore, there is a crucial need in many cases to enhance the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 traditional Weibull for modeling biomedical data. It follows that many attempts have been made to extend the baseline Weibull model by adding one or more additional parameters to achieve more flexibility in generating different shapes of data. To illustrate, [2] suggested the exponentiated Weibull distribution by applying the exponentiated method [3] in which a shape parameter is added to a baseline distribution. Beta-Weibull ditribution [4] is introduced based on the beta-generated method by [5]. Marshall-Olkin extended Weibull distribution [6] has been suggested to modify the Weibull distribution using the technique by [7]. This distribution has been applied to fit a dataset representing the remission times of bladder cancer patients. Furthermore, the Maxwell-Weibull distribution is introduced by [8] to model lifetime data. On the basis of the zero truncated Poisson model, [9] proposed a new compound distribution called the quasi Poisson Burr X exponentiated Weibull distribution, which accommodated many important failure rates. Moreover, in a recent study, [10] have derived a bimodal form of the Weibull distribution.
Researchers have shown a keen interest in developing new methods for expanding lifetime distributions. [11] developed a new method that add two extra shape parameters a, b > 0 to an arbitrary baseline distribution, called Kumaraswamy generalized (Kum-G) with a cumulative distribution function (cdf) defined as where X is a continuous random variable whose baseline (cdf) is G(x;θ) with a vector of parameter(s) θ. A number of studies have been applied this method to develop new distribution such as, the Kumaraswamy Gumbel by [12], the Kumaraswamy Birnbaum-Saunders by [13], the Kumaraswamy Burr XII distribution by [14], the Kumaraswamy generalized Rayleigh distribution by [15], the Kumaraswamy Laplace distribution by [16], the Kumaraswamy half-logistic distribution by [17], the Kumaraswamy exponentiated Weibull by [18], the Kumaraswamy Marshall-Olkin exponential distribution by [19] and the Kumaraswamy Pareto IV distribution by [20], among others. [21] introduced the Kumaraswamy Weibull (KumW) distribution as a generalization of the Weibull distribution and demonstrated its flexibility to fit failure data. The proposed distribution can be obtained by assuming GðxÞ ¼ 1 À e À ðlxÞ c of the Weibull distribution with scale parameter λ > 0 and shape parameter c > 0. Thus, the cdf and probability density function (pdf) of the KumW is obtained respectively as and f KumW ðx; a; b; c; lÞ ¼ abcl c x cÀ 1 e À ðlxÞ c ð1 À e À ðlxÞ c Þ aÀ 1 ½1 À ð1 À e À ðlxÞ c Þ a � bÀ 1 : The KumW distribution has been considered by some authors, for example, [22,23] discussed different types of statistical inference for constant stress accelerated life tests based on censored sampling data from the KumW distribution. [24] discussed some Bayesian analyses for the KumW distribution. [25] considered a regression model for bivariate random variables based on the bivariate KumW distribution. Although the KumW has been perfectly described many datasets, it has been modified by some authors. For instance, [26] in which the KumW is generalised by considering the new modified Kumaraswamy-G in [27]. Additionally, [28] who consider the exponentiated class in [3] to generalise the KumW. More recently, [29] generalised the KumW by considering the transmuted class in [30].
On the other hand, [31] suggested a new approach, called alpha power transformation (APT), for generating distributions with additional parameter α in order to add more flexibility. Then, the APT for an arbitrary baseline cdf G and pdf g for a random variable X with a vector of parameter(s) θ can be obtained as follows with the corresponding pdf as [31] applied their suggested way to a one-parameter exponential distribution to develop alpha power exponential distribution with two-parameters. Several authors have been applied the method of APT to extend some exiting distributions in the literature. Examples include the alpha power Weibull distribution by [32,33], the alpha-power inverse Weibull distribution by [34,35], the alpha power inverted exponential by [36], the alpha power transformed extended exponential distribution by [37], the alpha power transformed power Lindley by [38], the alpha power transformed Lindley by, [39], the alpha power transformed inverse Lindley by [40], the alpha power transformed inverse Lomax by [41], alpha power Maxwell distribution by [42], the alpha power exponentiated inverse Rayleigh by [43] and the alpha power Weibullexponential by [44], among others.
Motivated by the idea that developing some new distributions will eliminate some issues that inherent in the existing distributions, the main objective of this paper is to introduce a novel generalization for the Weibull distribution. This distribution is constructed by combining the works of [21,31] introducing a new five-parameter distribution refereed to as the alpha power Kumaraswamy Weibull (APKumW) distribution. As compared to other probability distributions presented in the literature, the proposed model will increase the flexibility and adaptability for describing different shapes of hazard-rate functions, such as decreasing, increasing, bath-tub and upside down bath-tub shaped, which might extensively experienced in real life data. Particularly, as indicated by [45] for the effectiveness of employing the APT distributions for cancer research, this paper focuses in exploring the adaptability of the proposed distribution to describe the survival time by analyzing some cancer datasets. Additionally, another objective is to estimate the unknown model parameters using maximum likelihood method for both complete and censored cancer datasets.
The rest of the paper is organized as follows. In Section 2, the APKumW distribution is defined and its special cases are presented along with an useful expansion for its pdf. In Section 3, some of the properties of the proposed distribution are discussed. The maximum likelihood estimators (MLEs) of the distribution parameters are obtained in Section 4 based on uncensored and censored data. Consequently, some different simulation studies are carried out to assess the performance of the MLEs in Section 5. Finally, different applications of the APKumW distribution to complete and censored datasets are presented in Section 6. All computations throughout this paper were performed using the statistical programming language R.

Alpha power Kumaraswamy Weibull distribution
The APKumW distribution is suggested in this paper based on substituting by Eqs (1) and (2) respectively in Eqs (3) and (4). That is, the random variable X is said to have the APKumW distribution with five parameters θ = {a, b, c, λ, α}, if the cdf of X is and its corresponding pdf is Additionally, the survival and hazard rate functions of the APKumW distribution are respectively given by abcl c x cÀ 1 e À ðlxÞ c ð1 À e À ðlxÞ c Þ aÀ 1 1 À ð1 À e À ðlxÞ c Þ a if a ¼ 1: Incorporating skewness to the base distribution is done by adding the parameter α. The APKumW model is therefore a suitable model to describe positively skewed patterns in biomedical and public health data. Fig 1 displays some of the shapes that the pdf and hazard functions of the APKumW distribution can take for different values of its parameters. These different behaviours indicate the flexibility and adaptability for the APKumW to fit a variety of data shapes. Table 1 shows important special models of the APKumW distribution.

Expansion of the probability density function
Using the following power series expansion the pdf in Eq (6) can be written as Then, the following binomial expansion  is applied twice to obtain a useful expansion of the pdf of the APKumW as follows

Properties of alpha power Kumaraswamy Weibull distribution
Some properties of the APKumW distribution are considered in the following as

Simulation, quantiles and median
To simulate a random variable from APKumW distribution, Eq (5) can be used to obtain where U is a random variable follows a uniform (0, 1) distribution. Also, the p th quantile function of the APKumW distribution for 0 < p < 1, is given by Consequently for p ¼ 1 2 , the median for the APKumW can be obtained as

Moments
The r th moment of a random variable X is given by Then, the r th moment of the APKumW is given from Eq (11) as x rþcÀ 1 e À ðiþ1ÞðlxÞ c dx: By letting u = (i+ 1)(λx) c , then the r th moment can be obtained as where Γ(.) is the gamma function. Subsequently, the mean and variance can be obtained by substituting r = 1 and r = 2 in Eq (15). The moment generating function of a random variable X can be defined with the form That is, using the following power series expansion for the exponential function the moment generating function of a random variable X whose pdf in Eq (6), can be obtained similarly as

Rényi entropy
The Rényi entropy of a random variable X represents a measure of variation of the uncertainty and given by n > 0; n 6 ¼ 0: Then from Eq (6), we have Applying Eqs (9) and (10) twice, we obtain Then, x nðcÀ 1Þ e À ðiþnÞðlxÞ c dx � : By assuming u = (i+ ν)(λx) c , the Rényi entropy for the APKumW can be expressed as

Order statistics
Suppose that F(x) and f(x) are respectively the cdf and pdf of n independent and identically distributed random variables X 1 , X 2 , . . .X n with X 1:n < X 2:n < . . . < X n:n be their corresponding ordered statistics. Then, the pdf of the s th order statistic can be obtained as Using the binomial theorem, we have Substituting by Eqs (5) and (6) and using the binomial theorem, we get f s:n ðxÞ ¼ n! ðs À 1Þ!ðn À sÞ!
Then, using the series expansion in Eq (9) and applying the binomial theorem in Eq (10) twice, we obtain

Parameter estimation for alpha power Kumaraswamy Weibull distribution
The maximum likelihood method is applied to obtain the estimation for the parameters of APKumW distribution. That is, if we have a random sample x 1 , x 2 , . . ., x n from the APKumW distribution, with the unknown vector of parameter θ = (a, b, c, λ, α), then the log-likelihood function (ℓ) can be defined as The associated nonlinear equations for the partial derivative of ℓ with respect to each parameter, are given as and @' @a ¼ 1 a n logðaÞ À n a À 1 À Then, the MLEs of the unknown parameters can be obtained by equating the equations from Eqs (21) to (25) to zero and solving them simultaneously. Particularly, a numerical iterative approach, such as the Newton-Raphson algorithm should be applied to solve these equations. Alternatively, any software like R, might be used to maximise Eq (20) directly and obtain the MLEs.
Studying survival times often results in the presence of censored observations, meaning there are incomplete observations of the period of interest. Right censoring technique is applied in medical studies when some patients lost to follow up and their exact occurrence time cannot be determined. The most common form of right censoring, which is encountered in survival analysis, is type I right censoring. A study of this type occurs when it is conducted over a specified period of time that will end before all units have failed. To illustrate, consider a study for a random sample of n patient in which, each patient is assigned a censoring time Y i ; i = 1, . . ., n, that is the time between entry and the end of the study and where X i ;i = 1, . . ., n, be the failure time of the i th patient. These X i 's and Y i 's are supposed to be independent and follow the APKumW distribution in Eq (6) and a non-informative distribution, respectively. For if failure has occurred 0 if censoring has occurred ( Then, the log-likelihood function (ℓ) will be where f(.) and SF(.) are respectively defined in Eqs (6) and (7). In order to obtain the MLEs, the log-likelihood in Eq (26) can be maximized numerically.

Simulation study
Some simulation studies are conducted to evaluate the performance of the MLEs for the five parameters of APKumW distribution. The simulation is considered over a number of iterations equal to nsim = 1000, for different sample sizes n with the following cases for the true parameters θ tr The MLEs for each estimatorŷ can be evaluated using an accuracy measurement, such as the root mean squared error (RMSE) that can be calculated as follows All estimation results are obtained using the "optim" function in R software. Table 2 shows the results for the MLEs of the parameters of APKumW along with their corresponding RMSE. Generally, it can be seen from this table, that the MLEs are more closer to the true values of the parameters as the sample size increased. In addition, RMSE became smaller as sample size n increased, indicating that the estimates are consistent. These results demonstrate that maximum likelihood method is effective at estimating the parameters of the proposed distribution.

Applications
Six real datasets for cancer patients are fitted using the APKumW distribution. The results obtained using the APKumW distribution are compared against the corresponding ones achieved with the application of the following The Weibull distribution with the following pdf The Beta Weibull (BW) distribution [4] with the following pdf The KumW distribution [21] with the following pdf The exponentiated Kumaraswamy Weibull (EKumW) distribution [47] with the following pdf f ðxÞ ¼ aabcl c x cÀ 1 e À ðlxÞ c ð1 À e À ðlxÞ c Þ aÀ 1 ½1 À ð1 À e À ðlxÞ c Þ a � bÀ 1 � f1 À ½1 À ð1 À e À ðlxÞ c Þ a � b g aÀ 1 : The alpha power Weibull (APW) distribution [32] with the following pdf if a ¼ 1:

< :
A variety of tools can be applied for comparing different competing distribution for a specific dataset and choosing the best model for the fitting. To investigate the goodness-of-fit for the compared distribution, Akaike Information Criterion (AIC) and Kolmogorov-Smirnov (KS) along with its P value are considered in order to choose the best distribution. The better distribution is which corresponds to the lower values of AIC, KS and highest P value of KS statistic. The plots of the estimated cdf for each of the distributions are compared with the plot of the empirical cdf. Also, the histogram of the observed frequencies is compared with the plots of the expected frequencies for each fitted distribution. The MLEs of the parameters for all the five datasets along with their SEs (in parentheses) and the corresponding goodness-of-fit criteria for all the competing models are respectively presented in Tables 3-7. Additionally,

Acute bone cancer dataset
[48] considered a simulated data represents the survival times (in days) of 73 patients who diagnosed with acute bone cancer, as follows: 0.09, 0.76, 1.81, 1.10, 3.72, 0.72, 2.49, 1.00, 0.53, Table 6. MLEs, (SEs) for the parameters and associated goodness of fit statistics for the bladder cancer I data.    Table 8 shows the MLEs, SEs of the unknown parameters of the APKumW distribution for the censored data obtained by maximizing the log-likelihood function in Eq (26). The table   Table 8. MLE, (SE) for the parameters and associated goodness of fit statistics for the censored data. also displays the MLEs, SEs of the unknown parameters of the Weibull and exponentiated Kumaraswamy Weibull (EKumW) distributions based on the censored cancer data. As shown by the lowest AIC for the APKumW, it appears that the distribution can fit censored data well.

Conclusion
Choosing a suitable model for fitting survival data has been a major concern among researchers. One of the most popular distributions for life-time data is the Weibull distribution. In this paper, the Weibull distribution is extended to provide a new distribution called the APKumW to model life time data. It has different special cases which have been presented in the paper. A number of statistical characteristics of the proposed distribution have been studied, including survival and hazard functions, quantiles, moments, Rényi entropy and order statistics. Inference of parameters for an APKumW was obtained using the method of maximum likelihood. The estimates have been evaluated via different simulation studies. A good performance is observed when the parameters have been estimated using the maximum likelihood method. The applications of statistical distributions are essential for medical research and can have a crucial impact on public health, especially for cancer patients. Thus, the usefulness of this distribution is illustrated through its applications to some real datasets that describe the survival of some cancer patients, including both complete and censored cases. The results indicate the superior performance of the APKumW distribution compared to other competitive distributions by means of different goodness-of-fit criteria. Overall, it is hoped that the proposed APKumW distribution will provide an alternative to other existing distributions available for modeling positive skewed real data in survival analysis, especially for cancer research.