Correction
10 Jul 2024: The PLOS One Staff (2024) Correction: Classical and Bayesian estimation for type-I extended-F family with an actuarial application. PLOS ONE 19(7): e0307200. https://doi.org/10.1371/journal.pone.0307200 View correction
Figures
Abstract
In this work, a new flexible class, called the type-I extended-F family, is proposed. A special sub-model of the proposed class, called type-I extended-Weibull (TIEx-W) distribution, is explored in detail. Basic properties of the TIEx-W distribution are provided. The parameters of the TIEx-W distribution are obtained by eight classical methods of estimation. The performance of these estimators is explored using Monte Carlo simulation results for small and large samples. Besides, the Bayesian estimation of the model parameters under different loss functions for the real data set is also provided. The importance and flexibility of the TIEx-W model are illustrated by analyzing an insurance data. The real-life insurance data illustrates that the TIEx-W distribution provides better fit as compared to competing models such as Lindley–Weibull, exponentiated Weibull, Kumaraswamy–Weibull, α logarithmic transformed Weibull, and beta Weibull distributions, among others.
Citation: Alfaer NM, Bandar SA, Kharazmi O, Al-Mofleh H, Ahmad Z, Afify AZ (2023) Classical and Bayesian estimation for type-I extended-F family with an actuarial application. PLoS ONE 18(2): e0275430. https://doi.org/10.1371/journal.pone.0275430
Editor: Srinivasa Rao Gadde, The University of Dodoma, TANZANIA, UNITED REPUBLIC OF
Received: August 5, 2022; Accepted: September 16, 2022; Published: February 2, 2023
Copyright: © 2023 Alfaer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data can be found in the Github repository (https://github.com/almof1hm/Hospital-costs-in-the-state-of-Wisconsin/blob/main/Data.xlsx).
Funding: This study was funded by the Taif University Researchers Supporting 279 Project (TURSP-2020/316), Taif University, Taif, Saudi Arabia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviation: ABs, Absolute bias; ADEs, Anderson-Darling estimates; ALW, α logarithmic transformed Weibull; APT, Alpha-power transformation; AVEs, Average estimates of the parameters; BEs, Bayes estimates; Bur, Burr; BW, Beta Weibull; cdf, Cumulative distribution function; CVMEs, Cramér-von Mises estimates; EW, Exponentiated Weibull; Ex-APT, Extended alpha-power transformation; hrf, Hazard rate function; JPD, Joint posterior density; KMS, Kaplan-Meier survival; K-S, Kolmogorov-Smirnov; Kur, Kurtosis; KwW, Kumaraswamy Weibull; LFs, Loss functions; LiW, Lindley Weibull; Lo, Lomax; LSEs, Least squares estimates; MLEs, Maximum likelihood estimates; MPSEs, Maximum product of spacings estimates; MREs, Mean relative errors; MSELF, Modified SELF; MSEs, Mean-squared errors; PCEs, Percentiles estimates; pdf, Probability density function; PLF, Precautionary LF; qf, Quantile function; RADEs, Right-tail Anderson-Darling estimates; rv, Random variable; SELF, Squared error LF; SEs, Standard errors; Sk, Skewness; TIEx-F, Type-I extended-F; TIEx-W, Type-I extended-Weibull; Wi, Weibull; WLSEs, Weighted least squares estimates; WSELF, Weighted SELF
1 Introduction
Modeling insurance losses data using heavy-tailed distributions is an important subject matter for risk managers and actuaries. Generally, the insurance sets of data are usually right-skewed, hump-shaped, unimodal, and have thick right tail. Distributions possessing such characteristics are considered prominent candidates for modeling such heavy-tailed data.
The heavy-tailed distributions are adopted to estimate insurance losses data and thereby helps in assessing of business risk level. Hence, due to its immense significance in actuarial sciences, these types of data are studied and explored extensively and several distributions are introduced in the actuarial literature.
The insurance loss data, financial returns, file sizes on network servers etc, are explored and modeled by several models such as the Lomax [1], Pareto [2], Weibull [3], Burr [4], log-logistic [5], Stoppa [6] distributions, among others.
However, the classical distributions are still not flexible enough to adequately fit such data sets, and some distributions do not possess closed form for its cumulative distribution function (cdf) causing difficulties related to the parameter estimation. For example, (i) the Pareto distribution has a monotonically decreasing shape for its probability density function (pdf), and it does not often provide best fit to many data sets, (ii) Weibull distribution is suitable for modeling small losses, but, unfortunately, it is not suitable for modeling large losses, and (iii) log-normal distribution does not have closed form solution of cdf result in estimation consequences.
To overcome the aforementioned problems, the researchers have been working on to propose new distributions to address these issues. For example, the heavy-tailed distributions by [7], Pareto–Levy by [8], loss models by [9], Weibull–Pareto by [3], generalized log-Moyal by [10], and generalized Pareto by [11].
The TIEx-F family has some desirable characteristics. It is novel and very simple approach of adding an additional parameter to generalize the existing distributions, hence it can provide extended versions of baseline distributions with closed form expressions for their cdfs and hrfs. The TIEx-F family provides better fit than other competing modified models under the same baseline model. Further a new lifetime model, based on the TIEx-F family, called the TIEx-Weibull (TIEx-W) distribution is studied. The TIEx-W distribution can provide increasing, decreasing and modified bathtub shaped hrf.
Additionally, the TIEx-W parameters are estimated by using several estimation methods and their performance is explored by detailed simulations for small and large samples. Many authors have addressed different estimators to estimate the parameters of generalized models such as the Weibull–Marshall–Olkin power-Lindley distribution by [12] and the Marshall–Olkin–Weibull exponential distribution by [13].
The rest of this work is unfolded as follows. Section 2 is devoted to introducing the proposed family. In Section 3, we present a special sub-model of the proposed method. Some characteristics are provided in Section 4. Section 5 is devoted to the estimation of the model parameters. A detailed simulation study is explored in Section 6. A real data application is discussed in Section 7. Bayesian estimation under five loss functions are explored in Section 8. The paper is concluded in Section 9.
2 The TIEx-F family
In the recent era, the statisticians have shown an increased interest to propose new family of distributions by introducing additional parameters. In this credit, Mahdavi and Kundu [14] pioneered a new method, called the alpha-power transformation (APT) family, for generating univariate distributions using the following cdf
(1)
The AP-F class is studied with more detail in [15]. [16] proposed another approach, called extended APT (Ex-APT) family for generating new distributions which is specified by the cdf
(2)
[17] introduced another new method called, Ampadu APT family which is defined by the cdf
(3)
In this work, we address a new class to generate new flexible distributions, called type-I extended-F (TIEx-F) family which has the following cdf
(4)
where F(x; Ψ) is cdf of the baseline random variable (rv) depending on the vector of parameters Ψ, and η is an additional parameter.
The pdf corresponding to (4) takes the form
(5)
The new pdf (5) of the TIEx-F family will be most tractable for simple analytical expressions of F(x; Ψ) and f(x; Ψ). Henceforth, a rv X with pdf (5) is denoted by X∼ TIEx-F (η, Ψ).
3 The TIEx-W distribution
Using the cdf of the Weibull (W) distribution, and its pdf,
, where Ψ = (α, γ)⊺, we obtain the cdf of the TIEx-W distribution
(6)
The corresponding pdf of the TIEx-W distribution reduces to
(7) Fig 1 sketches some density plots for the TIEx-W distribution. It can be seen that the pdf shapes of this model can be: reversed-J, left-skewed, right-skewed, or symmetry.
The survival function and the hrf of the TIEx-W are, respectively, given by
(8)
and
(9)
Fig 2 displays some hrf plots for the TIEx-W distribution. It can be seen that the hrf shapes of this model are: increasing, decreasing and modified bathtub.
4 Distributional properties
In this section, we derive some general properties of the TIEx-F family including: quantile function, median, rth moments and moment generating function, shapes of TIEx-W pdf and the order statistics.
4.1 The quantile function
The quantile function (qf) of the TIEx-F class follows by inverting the cdf (4). Thus, we have
(10)
where W0(⋅) is the principal branch of the Lambert function, and u follows the uniform distribution (0, 1).
Eq (10) can be adopted to generate random numbers from the TIEx-F family distributions.
So, the quantaile function for the TIEx-W distribution can be written as
(11)
4.2 The median
The median of the TIEx-F family distributions can be obtained using in (10), and it is given by
and of the TIEx-W distribution is given by
4.3 The rth moments and moment generating function
Let X ∼ TIEx-F (η, Ψ), hence the rth moment of X takes the form
(12)
By inserting (5) in (12), we obtain
(13)
The Maclaurin series is applied to the exponential function ey as follows
(14)
Using y = F(x; Ψ) in (14)
(15)
Hence, the rth moment follows as
(16)
where
and
The moment generating function of the TIEx-F family has the form MX (t), is given by
The first four moments of the TIEx-F family follows for r = 1,2,3,4. Numerical values for the mean, variance, skewness (Sk) and kurtosis (Kur) of the TIEx-W distribution for some parametric values are reported in Tables 1 and 2. These tables show that the new additional parameter provides more flexibility to the TIEx-W distribution in terms of its Sk and Kur.
4.4 Shapes of TIEx-W pdf
The behavior of the pdf in (7) when x → 0 and x → ∞ are, respectively, given by
This clearly appears in Fig 1.
4.5 Order statistics
Let X1, X2, …Xn be a random sample from (7) and X1:n ≤ X2:n ≤ … ≤ Xn:n denote the the corresponding order statistics. It is well known that the pdf and the cdf of the of rth order statistics, say, Xr:n and 1 ≤ r ≤ n, respectively, are given by
(17)
and
(18)
for k = 1, 2, …, n. It follows from (17) and (18) that the pdf and cdf of the rth order statistic of the TIEx-F family can be reduced to
and
So, the pdf and cdf of the rth order statistic of the TIEx-W model can be reduced to
and
5 Estimation for the TIEx-W parameters
In this section, eight estimation methods are considered to estimate the unknown parameters of the TIEx-W model.
5.1 Maximum likelihood method
Consider a random sample from the TIEx-W model with pdf given by (7), denoted by x1, …, xm, and their associated observed order statistics, denoted by x(1), x(2), ⋯, x(n). Then, the log-likelihood function reduces to
(19)
The maximum likelihood estimates (MLEs) of α, γ and η can be determined by maximizing (19) with respect to α, γ and η or by solving the following two non linear equations
(20)
(21)
and
(22)
The three Eqs (20)–(22) have no explicit solutions, hence numerical techniques will be employed to obtain the MLEs of the parameters.
5.2 Maximum product of spacings method
The maximum product of spacings estimates (MPSEs) for α, γ and η of the TIEx-W model can be obtained by maximizing the following function with respect to α, γ and η (23)
The MPSEs of the parameters α, γ and η, denoted by ,
and
, follows by maximizing Eq (23) or by solving the following three equations simultaneously
and
where
(24)
(25)
and
(26)
5.3 Least squares and weighted least squares methods
The least squares estimates (LSEs) and weighted least squares estimates (WLSEs) [18] of the TIEx-W parameters can be obtained by minimizing the following function
where
for the WLS method and δk = 1 for the LS method. Practically, the LSEs of the parameters α, γ and η of the TIEx-W model, denoted by
,
and
, and the WLSEs, denoted by
,
and
, are obtained by solving the following two equations simultaneously
and
where
, ϑk(α, γ, η) and φk(α, γ, η) are given by (24)–(26).
5.4 Cramér-von-Mises and percentiles methods
The Cramér-von Mises estimates (CVMEs) of the parameters of the TIEx-W model, say and
, follows by minimizing the following equation with respect to the parameters α, γ and η
(27)
Equivalently, the CVMEs of the parameters α, γ and η are obtained by solving the following two equations
and
ϱk(α, γ, η), ϑk(α, γ, η) and φk(α, γ, η) are given by (24)–(26).
The percentiles estimation method was proposed by [19, 20]. The percentiles estimates (PCEs) of the TIEx-W parameters α, γ and η, denoted by ,
and
, can be obtained by minimizing
(28)
where Q(u) denotes the quantile function of the TIEx-X distribution and it has no closed form expression. Then, numerical techniques are employed to generate data from the TIEx-X distribution.
5.5 Anderson-Darling and right-tail Anderson-Darling methods
The Anderson-Darling (AD) method is known as a type of minimum distance estimators which can be obtained by minimizing the AD statistic. For the TIEx-W model, the AD estimates (ADEs) of the TIEx-W parameters α, γ and η, say ,
and
, can be obtained by minimizing
with respect to α, γ and η, where
. These estimates can also be obtained by solving the following equations
and
where
, ϑk(α, γ, η) and φk(α, γ, η) are given by (24)–(26).
Similarly, the right-tail Anderson-Darling estimates (RADEs) of the TIEx-W parameters α, γ and η, say ,
and
, are obtained by minimizing
with respect to α, γ and η. Furthermore, these estimates can also be obtained by solving the following equations simultaneously
and
where ϱk(α, γ, η), ϑk(α, γ, η) and φk(α, γ, η) are given by (24)–(26).
6 Simulation results
In this section, we have carried out an extensive simulation study to assess and compare the performance of the eight frequentist estimators. The methods are explored for n = {20, 50, 100, 200, 400} with parameter values α = (2.75), γ = (0.5, 2.0) and η = (0.67, 1.5). We generate N = 5000 random samples from the TIEx-W distribution using the inverse transform method. The following procedures are adopted to generate the data from the TIEx-W distribution:
- Step 1: Generate random values from the TIEx-W distribution with size n.
- Step 2: Using the obtained samples in step 1, calculate
,
and
via 1-MLES, 2-MPSEs, 3-LSEs, 4-CVMEs, 5-WLSEs, 6-PCEs, 7-ADEs, 8-RADEs.
- Step 3: Repeat the steps 1 and 2, N times.
For each estimate, we calculate average estimates of the parameters (AVEs), mean-squared errors (MSEs), absolute bias (ABs), and mean relative errors (MREs). The formulae of these measures are given for by
, ABs,
, and MREs,
.
The performance of the considered estimators are evaluated in terms of absolute bias, mean-squared error and mean relative error. Considering this approach, the most efficient estimation method will be the one whose MREs value is closer to one and bias closer to zero. All simulations are conducted via R software.
In Tables 3–5 we report the values of AVEs, MSEs, ABs, and MREs for the WLSEs, LSEs, MLEs, MPSEs, CVMEs, ADEs, RADEs and PCEs. The results show that all estimators reveal the property of consistency, where the MSEs and MREs decrease as sample size increases, for all parameter combinations. All estimators show the property of consistency for all parameter combinations. In summary, we conclude that the maximum likelihood (ML) method outperforms all other estimation methods. Therefore, ML methods is considered the optimal method for estimating the TIEx-W parameters.
7 Modeling insurance data
In this section, we illustrate the applicability and superiority of the TIEx-W distribution by comparing its goodness of fit to other well-known distributions which are used before for modeling financial and insurance data sets.
The analyzed insurance data represents hospital cost in the state of Wisconsin provided by the Office of the Health Care Information, Wisconsin’s Department of Health and Human Resources. The data is available at https://github.com/almof1hm/Hospital-costs-in-the-state-of-Wisconsin/blob/main/Data.xlsx.
We compare the fits of the proposed model (TIEx-W) among some competitive distributions, namely: Weibull (Wi) [21], Lindley–Weibull distribution (LiW) [22], exponentiated Weibull (EW) [23], Kumaraswamy–Weibull (KwW) [24], α logarithmic transformed Weibull (ALW) [25], Burr (Bur) [26], Lomax (Lo) [27] and beta Weibull (BW) [28] models, and the special cases Weibull models derived from families defined in (1)–(3), for the insurance data set.
Table 6 lists the MLEs and the corresponding standard errors (SEs) in parentheses of the parameters for all fitted models, and the Kolmogorov-Smirnov (K-S) statistics and p-values for the insurance data set. Since the TIEx-W model has the lowest K-S values and the largest p-values among all fitted models, we can say the TIEx-W model is best model to the analyzed data set.
The fitted cdf and Kaplan–Meier survival (KMS) plots are displayed in Fig 3. The PP and box plots are sketched in Fig 4. The plots show that the insurance data has a heavier tail and the TIEx-W distribution fits it very closely.
8 Bayesian estimation from insurance data
In this section, the insurance data is analyzed using the Bayesian analysis. Let the parameters α, γ and η of TIEx-W distribution have independent gamma priors as
where a, b, c, d, e and f are positive. Then, the joint prior density follows as
(29)
We will adopt the well-known five loss functions (LFs) which are listed Table 7, with their Bayes estimators and posterior risk. More information can be explored in [29].
Now, we derive the posterior probability distribution for a complete data. We define the function φ
The joint posterior distribution has the form
(30)
Therefore, the joint posterior density (JPD) of the parameters α, γ and η for complete data is obtained by combining Eq 29 and the likelihood function. Hence, the JPD reduces to
(31)
where K is defined by
(32)
Eq (31) shows that there is no closed form for the Bayes estimates (BEs) under the LFs in Table 7, hence we will use the MCMC procedure based on 10, 000 replicates to obtain the BEs. We calculate the BEs of the TIEx-W parameters under different LFs which are mentioned in Table 7. The Bayesian point and interval estimation and posterior risk for the insurance data are listed in Table 8. Table 9 lists 95% credible and HPD intervals for the TIEx-W parameters. The posterior samples are extracted using Gibbs sampling technique. The MCMC iterations of α, γ and η are, respectively, plotted. These summary plots are provided in Figs 5–7. In summary, the BEs of the TIEx-W parameters are consistent especially under the SELF in terms of their lowest risks.
Table 10 displays the parameter estimates under various estimation methods with Bayesian estimation and the goodness-of-fit statistics for insurance data. Furthermore, the histogram of the fitted TIEx-W model under various estimation methods with Bayesian estimation for insurance data are displayed in Fig 8. Furthermore, Fig 8 shows the fitted densities and distribution functions of the TIEx-W under various estimation methods with Bayesian estimation for insurance data.
9 Concluding remarks
In the present paper, we have introduced a new class of heavy-tailed distributions allowing closed form expressions for distribution function and some of its basic properties. The proposed class is called type-I extended-F (TIEx-F) family and one of its special sub-models, called the TIEx-Weibull (TIEx-W) distribution, is addressed. The TIEx-W parameters are obtained based on eight methods of estimation, and a detailed simulation study is provided. Based on our study, we conclude that the maximum likelihood method outperforms all other classical estimation methods. Hence, it is recommended to estimate the parameters of the TIEx-Weibull distribution. The applicability of the TIEx-W distribution has been illustrated using an insurance data set. The insurance data is fitted using the TIEx-W distribution and other competing models. The results illustrate that the TIEx-W distribution provides better fit as compared to competing distributions. The Bayesian analysis based on real-life insurance data is also discussed under five loss functions. The analysis shows that the Bayesian estimates of the TIEx-W parameters are consistent especially under the squared error loss function in terms of their lowest risks.
Acknowledgments
The authors would like to thank the Editorial Board, and three reviewers for their constructive comments and suggestions which greatly improved the final version of this manuscript.
References
- 1. Scollnik D.P. (2007). On composite Log-normal-Pareto models. Scandinavian Actuarial Journal, 2007, 20–33.
- 2. Cooray K., & Ananda M.M. (2005). Modeling actuarial data with a composite lognormal-Pareto model. Scandinavian Actuarial Journal, 2005, 321–334.
- 3. Scollnik D.P., & Sun C. (2012). Modeling with Weibull–Pareto models. North American Actuarial Journal, 16, 260–272.
- 4. Nadarajah S., & Bakar S.A. (2014). New composite models for the Danish fire insurance data. Scandinavian Actuarial Journal, 2014, 180–187.
- 5. Bakar S.A., Hamzah N.A., Maghsoudi M., & Nadarajah S. (2015). Modeling loss data using composite models. Insurance: Mathematics and Economics, 61, 146–154.
- 6. Calderín-Ojeda E., & Kwok C.F. (2016). Modeling claims data with composite Stoppa models. Scandinavian Actuarial Journal, 2016, 817–836.
- 7. Beirlant J., Matthys G. & Dierckx G. (2001). Heavy-tailed distributions and rating. ASTIN Bulletin, 31, 37–58.
- 8. Coronel-Brizio H.F., & Hernandez-Montoya A.R. (2005). On fitting the Pareto–Levy distribution to stock market index data: selecting a suitable cutoff value. Physica A: Statistical Mechanics and its Applications, 354, 437–449.
- 9.
Klugman S.A., Panjer H.H., & Willmot G.E. (2012). Loss models: from data to decisions (Vol. 715). John Wiley & Sons.
- 10. Bhati D., & Ravi S. (2018). On generalized log-Moyal distribution: A new heavy tailed size distribution. Insurance: Mathematics and Economics, 79, 247–259.
- 11. Ghitany M., Gómez-Déniz E., & Nadarajah S. (2018). A New Generalization of the Pareto Distribution and Its Application to Insurance Data. Journal of Risk and Financial Management, 11, 10.
- 12. Al-Babtain A.A., Kumar D., Gemeay A.M., Dey S., & Afify A.Z. (2021). Modeling engineering data using extended power-Lindley distribution: Properties and estimation methods. Journal of King Saud University-Science, 33, p.101582.
- 13. Afify A.Z., Al-Mofleh H., Aljohani H.M., & Cordeiro G.M. (2022). The Marshall–Olkin–Weibull-H family: estimation, simulations, and applications to COVID-19 data. Journal of King Saud University-Science, 34, p.102115.
- 14. Mahdavi A., & Kundu D. (2017). A new method for generating distributions with an application to exponential distribution. Communications in Statistics-Theory and Methods, 46, 6543–6557.
- 15. Mead M.E., Cordeiro G. M., Afify A.Z., & Al-Mofleh H. (2019). The alpha power transformation family: properties and applications. Pakistan Journal of Statistics and Operation Research, 15, 525–545.
- 16. Ahmad Z., Ilyas M., & Hamedani G.G. (2019). The Extended Alpha Power Transformed Family of Distributions: Properties and Applications. Journal of Data Science, 17, 726–741.
- 17. Ampadu C.B. (2019). The Ampadu APT qTX—Family of Distributions Induced by V with an Illustration to Data in the Health Sciences. Annals of Biostatistics & Biometric Applications, 2, 1–5.
- 18. Swain J.S., & Wilson Venkatraman J. (1988). Least squares estimation of distribution function in Johnsons translation system, Journal of Statistical Computation and Simulation, 29, 271–297.
- 19. Kao J. (1958). Computer methods for estimating Weibull parameters in reliability studies. IRE Transactions on Reliability and Quality Control, 13, 15–22.
- 20. Kao J. (1959). A graphical estimation of mixed Weibull parameters in life testing electron tube. Technometrics, 1, 389–407.
- 21. Weibull W. (1939) A Statistical Theory of the Strength of Materials. Generalstabens Litografiska Anstalts Förlag, Stockholm.
- 22. Cordeiro G.M., Afify A.Z., Yousof H.M., Cakmakyapan S., & Ozel G. (2018). The Lindley Weibull distribution: properties and applications. Anais da Academia Brasileira de Ciências. 90, 2579–2598. pmid:30304208
- 23. Mudholkar G. and Srivastava D. (1993). Exponentiated Weibull family for analyzing bathtub failure-real data. IEEE Transactions on Reliability, 42, 299–302.
- 24. Cordeiro G. M., Ortega E.M.M., & Nadarajah Saralees. (2010). The Kumaraswamy Weibull distribution with application to failure data. Journal of the Franklin Institute, 347 (8), 1399–1429.
- 25. Dey S., Nassar M., & Kumar D. (2017). α Logarithmic Transformed Family of Distributions with Application. Annals of Data Science, 4, 457–482.
- 26. Burr I.W. (1942). Cumulative frequency functions. Annals of Mathematical Statistics, 13 (2), 215–232.
- 27. Lomax K. S. (1954). Business failures. Another example of the analysis of failure data. Journal of the American Statistical Association, 49 (268), 847–852.
- 28. Lee C., Famoye F., & Olumolade O. (2007). Beta Weibull distribution: some properties and applications to censored data. Journal of Modern Applied Statistical Methods, 6, 173–186.
- 29. Calabria R., & Pulcini G. (1996). Point estimation under asymmetric loss functions for left-truncated exponential samples. Communications in Statistics-Theory and Methods, 25, 585–600.