A new generator for proposing flexible lifetime distributions and its properties

In this paper, we develop a generator to propose new continuous lifetime distributions. Thanks to a simple transformation involving one additional parameter, every existing lifetime distribution can be rendered more flexible with our construction. We derive stochastic properties of our models, and explain how to estimate their parameters by means of maximum likelihood for complete and censored data, where we focus, in particular, on Type-II, Type-I and random censoring. A Monte Carlo simulation study reveals that the estimators are consistent. To emphasize the suitability of the proposed generator in practice, the two-parameter Fréchet distribution is taken as baseline distribution. Three real life applications are carried out to check the suitability of our new approach, and it is shown that our extension of the Fréchet distribution outperforms existing extensions available in the literature.


Introduction
The modeling and analysis of lifetime phenomena is an important aspect of statistical work in a wide variety of scientific and technological fields. The field of lifetime data analysis has grown and expanded rapidly with respect to methodology, theory, and fields of application. In the context of modeling the real life phenomena, continuous probability distributions and many generalization or transformation methods have been proposed. These generalizations, obtained either by adding one or more shape parameters or by changing the functional form of the distribution, increase the flexibility of the distributions and model the phenomena more accurately. Extensive developments in software have made it possible to focus less on computational details and hence simplified the methods of estimation.
The following are prominent and highly cited generators or transformations proposed over the past years in the statistical literature for modeling lifetime distributions. [1] transform the survival function by adding an extra shape parameter. The exponentiated family of distributions, which adds a shape parameter as exponent to an existing cumulative distribution function (cdf), is presented by [2]. The beta-generated family by [3] is based on both Beta type-I and Beta type-II distributions, while the Kumaraswamy-generated family by [4] uses the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Kumaraswamy distribution instead of the Beta distribution. [5] pioneered a versatile and flexible gamma-G class of distributions based on the Generalized Gamma distribution.
Let F(x; z) be the cdf of a given random variable depending on some real-valued parameter (s) z. Our approach in this paper consists in enriching this cdf by transforming it into Gðx; xÞ ¼ log f2 À e À lFðx;zÞ g log f2 À e À l g ; ð1Þ where ξ = (λ, z) for some positive real-valued shape parameter λ and the parameter z from the baseline distribution. We call this transformation the log-expo transformation (LET). It is aspired from [6] who considered the less versatile transformation While their approach only allows modulating the shape of distributions in a fixed way, ours is more flexible since it contains the extra shape parameter λ to regulate the transformation. To evaluate the suitability of the new proposed transformation, we will take the Fréchet distribution by [7] as example of baseline distribution throughout the rest of this paper. The remaining paper is organized in the following order. The density function of the proposed method is defined and its basic statistical properties are derived. Next, we discuss, parameter estimation via maximum likelihood for complete and censored data, together with submodel likelihood ratio test. Monte Carlo simulation study to show the consistency of our estimation procedures. The fitting abilities of our new approach is illustrated by means of three real data sets. Finally, we give concluding remarks, and the Appendix collects densities of distributions used in the real data analysis.

The proposed density and its properties
The probability density function (pdf) corresponding to Eq (1) is given by gðx; xÞ ¼ lf ðx; zÞe À lFðx;zÞ log f2 À e À l gf2 À e À lFðx;zÞ g ; where F(x; z) and f(x; z) are the arbitrary cdf and pdf of the baseline distribution. The cdf and pdf given in Eqs (1) and (3), respectively, will be more readable for a given expression of F(x; z) and f(x; z) of any baseline distribution. The flexibility of the proposed family of distributions is increased by adding shape parameter λ. Hereafter, we say that the random variable X having density Eq (3) is a log-expo transformed random variable.

LET-Fréchet distribution then correspond to
Gðx; l; a; bÞ ¼ ½log f2 À e À l g� Since the LET-F distribution is our red thread example, we also provide some moment expressions. In Table 1, we give the first four moments u 0 n , n = 1,. . .,4, the standard deviation (SD), coefficient of skewness (CS) and coefficient of kurtosis (CK) for different combinations of parameters. These values are calculated via Mathematica.

Lifetime data analysis and parameter estimation
The data encountered in survival analysis and reliability studies are often censored. This is why, besides classical maximum likelihood estimation, we also show how to estimate the parameters of our new family of distributions when the data are censored. More precisely, we consider Type-II, Type-I and random (right) censoring. These censoring schemes have been

PLOS ONE
employed in numerous fields, especially for crash rates on roads which are based on censored data. Such data can be handled by using tobit, multinomial logit, mixed logit, ordered logit probit/logit models. for example, see the articles [8][9][10][11][12][13][14][15]. Finally, we develop likelihood ratio tests for testing the suitability of the baseline distributions against our LET extension.

PLOS ONE
Differentiating the log-likelihood with respect to λ and z and equating to zero, we get the score equations and @llðxÞ @z ¼ and F z ðx i ; zÞ ¼ dFðx i ;zÞ dz . Solving Eqs (4) and (5) gives the maximum likelihood estimates of the unknown parameters λ and z. Typically, this requires numerical optimization techniques such as Newton-Raphson methods as given in [16 and 17].

Parameter estimation under various types of right censoring
Let x 1 , x 2 , . . ., x n be the observations of a random sample of size n from the LET model. In what follows, we explain how to perform maximum likelihood estimation in our LET model for three types of right censoring.

Type-II censoring
In case of Type-II right censoring, t observations out of the n are censored from the right side. The likelihood function then becomes where x (i) is the order statistic of order i, and the log-likelihood function, expressed in terms of the original baseline distribution, reads lf ðx ðiÞ ; zÞe À lFðx ðiÞ ;zÞ log f2 À e À l gf2 À e À lFðx ðiÞ ;zÞ g " #

:
Differentiating this log-likelihood with respect to λ and z yields the score equations Fðx ðiÞ ; zÞ þ e À l f2 À e À l g log f2 À e À l g þ Fðx ðiÞ ; zÞe À lFðx ðiÞ ;zÞ 2 À e À lFðx ðiÞ ;zÞ Expressions (6) and (7) give the maximum likelihood estimates of the unknown parameters λ and z for type-II right censored data. It is clear that their solution cannot be obtained analytically, and numerical techniques used in [16 & 17] are required.

Type-I censoring
Suppose that a random sample of n units from G(x; ξ) is processed for a predefined time x c and then the process terminate. We observed the lifetime of δ observations before terminating the process and the remaining n − δ observations will be censored. Thus, the lifetimes are observed only if x i � x c for i = 1, 2, . . ., n.
I i , the likelihood function can be written as Sðx c ; xÞ nÀ d and the log-likelihood function is given by llðxÞ ¼ ðn À dÞlog 1 À log f2 À e À lFðx c ;zÞ g log f2 À e À l g � � þ X n i¼1 I i log lf ðx i ; zÞe À lFðx i ;zÞ log f2 À e À l gf2 À e À lFðx i ;zÞ g � � : The score equations, and associated maximum likelihood estimates, are obtained along the same lines as in the previous sections. Their solution cannot be obtained analytically, and numerical techniques given in [16 & 17] are required.

Random censoring
Suppose a random sample consists of n observations T 1 , T 2 , . . ., T n from a continuous failure distribution G(t; ξ) and consider other random censoring variables C 1 , C 2 , . . ., C n drawn independently from a censoring distribution H(c; ξ). The observations for right censored data are presented as (X i , I i ), i = 1, 2, . . ., n, where X i = Min(T i , C i ), and

PLOS ONE
The likelihood function for random censored data x 1 , x 2 , . . ., x n can be written as I i log lf ðx i ; zÞe À lFðx i ;zÞ log f2 À e À l gf2 À e À lFðx i ;zÞ g � � þ X n i¼1 ð1 À I i Þlog 1 À log f2 À e À lFðx i ;zÞ g log f2 À e À l g � � : The score equations, and associated maximum likelihood estimates, are obtained along the same lines as in the previous sections. Their solution cannot be obtained analytically, and numerical techniques such as used in [16 & 17] are required.

Submodel testing
Our LET extension paves the way for submodel testing of the baseline distribution by means of likelihood ratio tests. For each parameter ξ, we denote byx the unconstrained maximum likelihood estimate and byx r the maximum likelihood estimate under the restricted submodel. For example, testing for the Fréchet distribution against the LET − F model can be achieved by the test statistic T Fr� echet ¼ À 2ðllðâ r ;b r Þ À llðl;â;bÞÞ, rejecting H 0 : λ = 0 at asymptotic level α against H 1 : λ 6 ¼ 0 whenever T Fréchet exceeds w 2 1;1À a , the α-upper quantile of the chi-squared distribution with one degree of freedom.

Monte Carlo simulation results of the LET-F model
We perform a Monte Carlo simulation study in order to evaluate the behavior of maximum likelihood estimates of the proposed LET-F distribution for complete and censored data. The data were censored 10% from the right by using the Type-II and Type-I schemes. We calculate means, biases and mean-squared errors (MSEs) of each parameter of the LET-F model for different sample sizes n. To obtain the results, the process is replicated N = 10,000 times for n = 20, 30, 50 and 100 for censored data, and we added the sample sizes 200 and 300 for the complete data. The simulated means, biases and MSEs for complete and censored data are provided in Tables 2 and 3, respectively. We observe that, overall, the estimation procedure works well and that the estimates become better with increasing sample size, as should be the case. It is noteworthy to remark that close-to-zero values of λ are more difficult to estimate, which is probably due to the fact that such small values only slightly trigger our transformation as compared to the baseline model.

Real data analysis
In this section, the fitting potential of our new procedure is evaluated by means of three real data sets, of which the last one is censored. In each case, we compare our LET-F model with competitors from the literature.

Non-censored data
The first data set shows the failure stresses (in GPa) of 64 bundles of carbon fibres and is also used by [18]. The second data set is presented by [19] and concerns the survival time counted in days of guinea pigs with infected virulent tubercle bacilli.
The proposed LET-F model is compared with the basic Fréchet (F) distribution as well as other extensions of it, such as the logarithmic transformed Fréchet (LTF) of [6], the Exponentiated Fréchet (EF) as initiated by [2], the Marshall-Olkin Fréchet (MOF) of [1], and the Kumaraswamy Fréchet (KF) according to the construction of [4]. We use the Kolmogorov-Smirnov (KS), Cramer-von Mises (W � ), Anderson-Darling (A) and Deviance Information Criterion (DIC), goodness-of-fit tests for the comparison. The DIC is a generalized form of AIC and is widely used for model adequacy (see, [20 and 21]). The best model exhibits the smallest value of these statistics. The results are obtained by using R. In the Appendix, we give the respective pdfs of the above mentioned distributions.
In Table 4, we provide the value of the KS test together with the related p-value. As we can see, our LET-F model exhibits twice the lowest KS value and, consequently, largest p-value. For the second data set, the LTF model (2), which we try to improve on in particular, is clearly rejected by the KS test statistic. To further corroborate the strength of our LET-F model, we provide in Table 5 the corresponding values of the W � , the DIC and the A statistics. They also reveal that the LET-F model is very appropriate for these data sets as it outperforms its competitors. For the sake of illustration, the histogram of both data sets and fitted pdfs of all

PLOS ONE
considered models are provided in Fig 4, while Fig 5 exhibits the corresponding PP-Plots. The better fit of the LET-F model for both the data sets included in this study can thus also be recognized visually. Finally, our likelihood ratio test yields a p-value of 0.027 for the first data set and 0.000 for the second data set. Thus, the Fréchet distribution is rejected in favour of the LET-F model for data set 2 at any level, while it is rejected at the classical 5% level for the first data set but no longer at, the 2% level. The maximum likelihood estimates (MLEs), Bayes estimates (BEs), and their corresponding standard errors (SEs) and posterior standard deviations (SDs), respectively, for the parameters of the LET-F and the competitor models are given in Table 6.

Right-censored data
We now consider a data set presented by [22] and also used by [23].  This data set is random censored, see [22]. Here we compare our LET-F model with three models proposed recently by [22], namely, the long term Fréchet (LTF), the long term Weibull (LTW) and long term weighted Lindley (LTWL) distributions. The general form of a long term survival function is S � (x) = p + (1 − p) S(x), where S(x) is the survival function of any distribution and p denotes the probability of being cured. The corresponding distributions and pdfs can then be deduced from this mixture survival function.
This time, we use the Akaike information criterion (AIC) and the DIC as model comparison; the smaller its values, the better the fit (of course, the same tests as in the previous section can also be run here). Table 7 contains the maximum likelihood and Bayes estimates of the parameters. For quantification of variability of the estimates, SEs of the MLEs (in parenthesis) and SDs (in parenthesis) of the posterior distributions are reported. The log-likelihood (L) value and AIC of the proposed LET-F model are almost the same as those of the LTF model, and clearly smaller than for the other two models. While considering the DIC value, our proposed LET-F performs better than all other competitive models. Additionally, we present in

Conclusion
In this paper, we have proposed a new general construction of flexible lifetime distributions by rendering any existing baseline distribution more versatile through a simple transformation. We have discussed properties of the new models and explained how to estimate the parameters for complete and censored data sets. A Monte Carlo and hit-and-run Metropolis-Hasting simulations studies has revealed that the classical and Bayesian estimation procedures work well. On the basis of three distinct real data sets, we could see that the LET-F model, based on the Fréchet distribution as baseline distribution, is a very good competitor to existing distributions, especially to existing generalizations of the Fréchet. These good fitting capacities,

PLOS ONE
combined with the simplicity of our proposal, make a strong case for using our construction in several practical situations.
Supporting information S1 Appendix. Probability density functions of the competitors models and data sets.