Type-I heavy tailed family with applications in medicine, engineering and insurance

In the present study, a new class of heavy tailed distributions using the T-X family approach is introduced. The proposed family is called type-I heavy tailed family. A special model of the proposed class, named Type-I Heavy Tailed Weibull (TI-HTW) model is studied in detail. We adopt the approach of maximum likelihood estimation for estimating its parameters, and assess the maximum likelihood performance based on biases and mean squared errors via a Monte Carlo simulation framework. Actuarial quantities such as value at risk and tail value at risk are derived. A simulation study for these actuarial measures is conducted, proving that the proposed TI-HTW is a heavy-tailed model. Finally, we provide a comparative study to illustrate the proposed method by analyzing three real data sets from different disciplines such as reliability engineering, bio-medical and financial sciences. The analytical results of the new TI-HTW model are compared with the Weibull and some other non-nested distributions. The Baysesian analysis is discussed to measure the model complexity based on the deviance information criterion.


Introduction
In many practical situations, such as financial sciences, reliability engineering and bio-medial sciences, data are usually positive, and their distribution is unimodal hump shaped and extreme values yielding heavier tails than the classical models. For example, in health science research, (a) the medical expenditures that exceed a given treshould [1] and (b) the lenght of stay in hospitals [2,3], present highly skewed, heavy tailed data, for which the standard classical distributions and simple variable transformation are insufficient to provide an adequate fit to such data. In reliability engineering, the interest most often lies in the occurrence of rather exceptional events which are associated with the tail part of a statistical distribution. For example, the earthquakes, tsunamis, hurricanes, electrical or power massive failures etc., are some examples of such type of rare and extreme events [4]. All aforementioned events and the rate at which they happens, are associated with the heaviness of the tail and shape of distributions. In financial and risk management problems, one of the important tasks is to predict accurately the losses that occurs with a high fiscal value. Underestimation of the probability of these losses The rest of work done in study is arranged in the following sections: the proposed family is discussed in Section 2. A sub-case of the proposed class is introduced and the shapes of its density and hazard rate functions are sketched in Section 3. Statistical properties of the new family are obtained in Section 4. The expressions for the maximum likelihood estimators are derived in Section 5. In the same section, a Monte Carlo simulation study is presented. The actuarial measures are derived in Section 6. In the same section, a simulation study based on these measures is also provided. Practical applications are discussed in Section 7. Finally, the article is concluded in Section 8.

Proposed method
This section offers the genesis of the proposed method. Recently, [16] proposed the T-X family method that is specified by the cumulative distribution function (cdf) where K[F(x; ξ)] fulfills certain conditions; see [16]. The probability density function (pdf) corresponding to Eq (1) is Deploying the T-X approach, a good deal of new families of statistical models have been proposed in the literature; see [17][18][19][20] and [21]. Let T � exp(1), then its cdf is given by Corresponding to expression (), the density function is If v(t) follows Eq (3) and setting K F x; x ð Þ ½ � ¼ À log � 1À Fðx;xÞ 1À ð1À yÞFðx;xÞ � y in Eq (1), the cdf of the type-I heavy-tailed (TI-HT) family follows as where F(x; ξ) is the baseline distribution function which may depend on x 2 R: Form Eq (4), we can see that G(x; θ, ξ) = F(x; ξ) for θ = 1.
Some key motivations of the proposed TI-HT method are the following: (i) An easy and convenient approach to modify the existing models, (ii) to improve the flexibility of the available models in the literature, (iii) to introduce a generalized form of existing models with closed expression for their distribution functions, (iv) to avail the best fit to real-world data as compared to other models with fewer parameters, same number of parameters and higher number of parameters and, (v) to provide an adequate fit to the heavy-tailed data in applied fields such as reliability engineering, medical and financial sciences and, other related fields.
The pdf associated to Eq (4) is We concentrate our focus to a special sub-case of the new family, called type-I heavy-tailed Weibull (TI-HTW) distribution. Finally, we direct our attention to the results related to the TI-HTW model with real life data in three different disciplines. The first data set is taken from bio-medical field and the results of the TI-HTW model is compared to five other competitor distributions including (a) two-parameter Weibull distribution and (b) three-parameter models such as alpha power transformed Weibull (APTW), Marshall-Olkin Weibull (MOW), transmuted Weibull (TW) and modified Weibull (MW) distributions. The second data set is taken from reliability engineering and the comparison of the new model is made with three other well-known distributions such as (a) the three-parameter extended alpha power transformed Weibull (Ex-APTW) and (b) four-parameter Kumaraswamy Weibull (Ku-W) and beta Weibull (BW) distributions. The third data set is taken from financial sciences and the results of the proposed model is compared with Weibull and other heavy-tailed models including Lomax and Burr-XII distributions.

Sub-model description
In the following section, we introduce the genesis of the TI-HTW distribution and discuss its special cases.

Special cases of the TI-HTW distribution
Let X follows the TI-HTW model with parameters (α, θ, γ). Then X reduces to 1. Weibull model with parameters α and γ, with θ = 1.

Statistical properties
In the following subsections, we study some statistical properties of the TI-HT distributions including the quantile function (qf), r th moment and moment generating function.

Quantile function
The qf of the TI-HT distributions is where u 2 (0, 1). From expression Eq (8), we can see that the proposed model has closed form solution of the qf which makes it easier to generate random numbers for any sub-case of the TI-HT family.

Moments
This sub-section deals with the derivation of rth moment of the TI-HT distributions. The rth moment of the TI-HT distributions is derived as x r gðx; y; xÞdx: ð9Þ Using Eq (5) in Eq (9), we have Using the expansion (https://math.stackexchange.com/questions/1624974/series-expansion-1-1-xn) Using x = (1 − θ)F(x; ξ) and n = θ + 1 in Eq (11), we get 1 ð1 À ð1 À yÞFðx; xÞÞ yþ1 ¼ Also using the series representation Using y = F(x; ξ) and m = θ − 1 in Eq (13), we get Using Eqs (12) and (14) in Eq (10), we have where k r;iþj ¼ R 1 À 1 x r f ðx; xÞFðx; xÞ iþj dx: For some pre-defined parameters values, numerical results for the descriptive measures (mean, variance, skewness and kurtosis of the TI-HTW mode are given in Tables 1 and 2. For γ = 1.5 and different values α and θ, plots for the mean, variance, skewness and kurtosis of the TI-HTW distribution are displayed in Figs 3 and 4. The moment generating function (mgf) of the TI-HT random variable X, say M X (t), is derived as Using Eq (15) in Eq (16), we get the mgf of the TI-HT distributions.

Estimation and simulation study
In the following section, we obtain the maximum likelihood estimators (MLEs) of the parameters of the proposed family. Furthermore, we conduct a Monte Carlo simulation study to assess the behavior of these estimators.

Maximum likelihood estimation
Numerous approaches for estimating the un-known parameters have been suggested to obtain the estimates of the parameters. Among them, the maximum likelihood (ML) approach is the most prominent and frequently used method. The estimators obtained via this approach possess useful properties and can be utilized for constructing the confidence interval and other statistical tests. The normal approximation of the MLEs can easily be treated either numerically or analytically. For more details about maximum likelihood estimation, we refer to [22,23]. In this sub-section, we adopt the ML approach for estimating the parameters of TI-HT family. Suppose X 1 , X 2 ,. . ., X n form an observed sample taken randomly from the TI-HT family with pdf (5). The corresponding log-likelihood function to (5) is where Θ = (α, γ, θ) T . The computer software such as ASS (PROC UNMIXED) can be used to maximize the log-likelihood function directly or via differentiating Eq (17). The partial derivatives of Eq (17) are given by Equating the nonlinear system of equations @'ðYÞ @y and @'ðYÞ @x to zero, and simultaneously solving these expressions, yields the MLEsŷ andx, respectively.

Monte Carlo simulation study
In this sub-section, we investigate the performance of the MLEs. For the simulation purposes, the special sub-model TI-HTW distribution is considered. The simulation process is conducted based on the following steps: • N = 1000 samples of size n = 25, 50, 75, . . ., 1000 are generated from TI-HTW model with parameters α, γ and θ. The inversion procedure of generating random number is used.
• Compute biases and mean square error (MSE) of the model parameters.
The simulation results are provided in Tables 3 and 4. The results in these tables indicate that the behavior of the estimates of the TI-HTW parameters are good, showing small bias and creditable MSEs in all studied cases; that is, these estimates are quite reliable and very close to the actual values. Further, the biases are approaching to 0 as the sample size increases, proving that the estimates are behaved asymptotically unbiased estimators. Moreover, the MSEs decrease as the sample size increases, showing that these estimators are consistent for the TI-HTW parameters.

Actuarial measures
In actuarial sciences and management institutions, one of the key tasks of the actuaries is to evaluate the exposure of market risk in a portfolio of instruments. In this section, we calculate some important risk measures including value at risk (VaR) and tail value at risk (TVaR) for the TI-HTW, which play a crucial role in portfolio optimization under uncertainty.

VaR measure
Let X follow the TI-HTW model with pdf (7), then the VaR of X denoted by VaR q (q is a specified level of significance) is given by TVaR measure The TVaR is one of the most important risk measures that quantifies the expected loss provided that an event outside a specified level of probability has occurred. Let X has the TI-HTW model, then the TVaR of X is computed as Inserting (12) and (14) in (21), we get where A i;j;y ¼ y On solving we get where B i;j;k;y ¼ y 2 P 1

Numerical study of the risk measures
In the current sub-section, we conduct numerical study of the VaR and TVaR measures for the TI-HTW distribution. The VaR and TVaR of the TI-HTW distribution are compared with the Weibull distribution as a nested model and the exponentiated Weibull (EW) distribution [24] as a non-nested model, which is one the most prominent generalization of the Weibull model. The numerical results are obtained as follows.
1. We generated a sample of size n = 100 from the Weibull, EW and TI-HTW distributions and their parameters have been estimated via ML method.
2. 1000 repetitions are made to calculate the VaR and TVaR for these distributions.
3. The numerical results of the risk measures are provided in Tables 5 and 6. Further, these results are displayed graphically in Figs 5 and 6, respectively.
The simulation is performed for the Weibull, EW and proposed models for selected values of their parameters. A model with higher values for VaR and TVaR is said to have a heavier tail. The simulated results provided in Tables 5 and 6 shows that the proposed TI-HTW model has higher values of the risk measures than the Weibull and EW distributions. Figs 5 and 6 also show that the proposed model has a heavier tail than the Weibull and EW distributions.

Comparative study
This section, we consider three heavy-tailed data from applied areas such as medical, engineering and financial sciences to study the flexibility of the proposed family. The key motivations of considering the heavy-tailed distributions are that they adequately provide the best fit to the heavy-tailed data. For each data set, the TI-HTW distribution is compared with different well-known distributions and we observed that the proposed distribution outclass other competitors.
To decide about the goodness of fit among the applied distributions, we consider certain analytical measures. In this regard, we consider two discrimination measures such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC); see [25].
In addition to the discrimination measures, other goodness of fit measures such as Cramer-Von-Mises (CM) test statistic, Anderson Darling (AD) test statistic and Kolmogorov-Smirnov (KS) test along with its p-values are also considered. The formulae for these measures can be found in [26]. A distribution with lower values of these analytical measures is considered to be a good candidate model among the applied distributions for the underlying data sets. By considering these statistical tools, we observed that the TI-HTW model is the best competitor compared to other models because the values of all selected criteria are significantly small for it.

A real life application from bio-medical sciences
The first data set is reported in [27], and it refers to the remission times of bladder cancer patients. For the first data set, the TI-HTW distribution is compared with the Weibull, MOW [28], MW [29], TW [30] and APTW [31] distributions. A number of authors have been used these distributions to model bio-medical data sets. For example, [32] used the Weibull and MOW distributions to model the survival times of the cancer patients. These data were analyzed by [33] and [34].
The maximum likelihood estimates of the models for cancer data are presented in Table 7. The analytical measures of the competitive models are provided in Table 8. Form Table 8, it is clear that the proposed distribution has lower values of these measures than the other models. The fitted cdf and Kaplan-Meier survival plots of the proposed model for cancer data are plotted in Fig 7. The PP plot of the TI-HTW model and box plot of the cancer data are sketched in Fig 8. From Fig 7, we can see that the proposed model fits the estimated cdf and Kaplan Meier survival plots very closely. From Fig 8, we can easily detect that the data set is skewed to the right (see box plot) and proposed model is closely followed the PP-plot.

A real life application from reliability engineering
Here, we investigate the TI-HTW distribution via analyzing a heavy-tailed reliability engineering data which are reported in [35], and they refer to failure time of coating machine. To show  Table 5.
https://doi.org/10.1371/journal.pone.0237462.g006 the potentiality of the proposed method, the TI-HTW distribution is applied in comparison with the Ex-APTW, Ku-W and BW distributions. The Ku-W [36]and Ex-APTW [37] have been used to model failure times data. Al-Malki [38] showed that the BW distribution is one of the most prominent extensions of the Weibull distribution that can be used quite effectively in failure rate time data. Corresponding to data set 2, the values of the model parameters are reported in Table 9. The analytical measures of the proposed and other competitive models are provided in Table 10. The estimated cdf and Kaplan-Meier survival plots are sketched in Fig 9, which show that proposed distribution fits the estimated cdf and Kaplan-Meier survival plots very closely. The PP and box plots are sketched in Fig 10.

A real life application from insurance sciences
The third data set from the insurance sciences and represents the vehicle insurance losses which are available at: http://www.businessandeconomics.mq.edu.au. For the third data, the TI-HTW distribution is compared with the Weibull, Lomax and Burr-XII distributions which https://doi.org/10.1371/journal.pone.0237462.g008 Table 9. Estimated values of the model parameters with standard error (in parenthesis) of the fitted models for data 2. are widely used in modeling financial and financial risk management problems. The Weibull distribution is one of the best competitors for modeling actuarial data up to a specified threshold; see [39]. Further, the Lomax [40] and Burr [41] distributions have been widely used in data modeling with tail beyond the threshold.

PLOS ONE
Type-I heavy tailed family The parameter values are reported in Table 11 for the insurance data, and the analytical measures are presented in Table 12

Baysesian analysis
We adopt a Bayesian formulation for our proposed model and drive posterior inference using Markov chain Monte Carlo (MCMC) algorithm. To generate MCMC samples from posterior distribution of the parameters of our joint model for Bayesian inference, we have used Win-BUGS software [42]. Note that to specify a likelihood contribution for a distribution that is not listed in WinBUGS, we have used the "zeros trick" [43]. In particular, a Poisson (λ) observation of zero has likelihood exp(−λ), so if our observed data is set of 0's, and λ[i] is set to −log(L[i]), we obtain the correct likelihood contribution. Since, λ[i] should always be >0 as it is a Poisson mean, we may need to add suitable constant to ensure that is positive. This is equivalent to multiplying each likelihood term by e −c . This process does not influence the likelihood since it is equivalent to multiplying the resulting posterior distribution by a constant term equal to e −nc . Thus, the likelihood takes the form The choice of a good prior distribution plays a key role in Bayesian inference. In practice, no information is precise enough to lead to the exact determination of the prior distribution. However, non-informative prior that allows the data to dominate to determine the posterior distribution are suggested for the Bayes-MCMC methods. We consider standard distribution for priors, such as gamma priors for α, γ and θ, as these are positive-valued random variables. Note that gamma priors are widely used in Bayesian literature for positive-valued random variables. For assessing convergence, a simple (informal) method of assessing chain convergence is to look at some graphical diagnostics such as trace plot, autocorrelation plot and density plots to determine the mixing of chains. If the chains show a reasonable degree of randomness between iterations, it signifies that the Markov chain has found an area of high likelihood and is integrating over the target density and hence indicating that it has converged. Moreover, we also use the Gelman-Rubin statistic R, another popular technique for diagnosing convergence. It is based on comparison of with in chain and between chain variances. Values of R substantially above 1 indicate lack of convergence. However, some authors suggests that R <1.2 is acceptable. To examine the empirical performance of the proposed methodology for model adequacy, deviance information criterion (DIC) is the most widely used criterion for model comparison in Bayesian analysis [44] and [45]. It is derived based on two principles: (i) goodness of fit measured via the deviance statistic, and (ii) model complexity measured by an estimate of the effective number of parameters, denoted by p D . When comparing two or more models, it is suggested that DIC M − DIC min > 10 or if the difference lies between 5 and 10, then there is considerably less support for Model M compared to the model with minimum DIC. However, DIC M − DIC min < 5 shows that no support for a model with the lowest DIC and may lead to misleading inference.
A real life application to AIDS data [46] described a study involving 467 human immunodeficiency virus (HIV) infected patients who had failed or were intolerant to zidovudine therapy (ZT). The main objective was to compare two antiretroviral drugs to prevent the progression of HIV infections: didanosine (ddI) and zalcitabine (ddC). To analyze the data, We construct two Morkov chains each of 100,000 iteration to approximate posterior density, each following a 10,000 iteration as a burn-in period. we consider here only the TI-HTW and Weibull distributions to model the time-toevent process. For sake of simplicity, we summarize only DIC values and distributional parameters values in Table 13. DIC values of the TI-HTW distribution and Weibull fits are 7314.9 and 7328.83, respectively, suggesting that the TI-HTW distribution has a superior fit over the Weibull distribution.

Conclusions
The importance of the extended distributions first realized in financial sciences and later in other applied fields such as engineering and medical sciences. To cater data in those fields, a number of methods have been introduced. In this context, we have studied a versatile three parameters heavy-tailed model, called type-1 heavy tailed Weibull distribution as a special case of a new approach allowing closed form expressions for some basic mathematical and other related properties. The proposed class is called type-I heavy-tailed family. The usefulness of the proposed family of heavy-tailed distributions has been proved via three data sets from medical, engineering and financial sciences and the model performs reasonably good than the wellknown competing heavy-tailed distributions. The developed family in this work is a promising method for modeling data in the distribution theory, may be useful for the researchers who deal with such data sets. Thus, the new model can be served as a good competitor alternative to other existing models.
Future work includes (i) bivariate extension of the actuarial measures and the Monte Carlo simulation study of these measures, (ii) modeling heavy-tailed data with bivariate extension, (iii) regression problems with covariates and (iv) parameter reduction. Author Contributions