Figures
Abstract
This article suggests a new method to expand a family of life distributions by adding a parameter to the family, increasing its flexibility. It is called the extended Modi-G family of distributions. We derived the general statistical properties of the proposed family. Different methods of estimation were presented to estimate the parameters for the proposed family, such as maximum likelihood, ordinary least square, weighted least square, Anderson Darling, right-tailed Anderson-Darling, Cramér-von Mises, and maximum product of spacing methods. A special sub-model with three parameters called extended Modi exponential distribution was derived along with different shapes of its density and hazard functions. Randomly generated data sets and different estimation methods were used to illustrate the behavior of parameters of the proposal sub-model. To illustrate the importance of the proposed family over the other well-known methods, applications to medicine and geology data sets were analyzed.
Citation: Gemeay AM, Alharbi WH, El-Alosey AR (2024) A new power G-family of distributions: Properties, estimation, and applications. PLoS ONE 19(8): e0308094. https://doi.org/10.1371/journal.pone.0308094
Editor: Mazyar Ghadiri Nejad, Cyprus International University Faculty of Engineering: Uluslararasi Kibris Universitesi Muhendislik Fakultesi, TÜRKIYE
Received: May 24, 2024; Accepted: July 17, 2024; Published: August 5, 2024
Copyright: © 2024 Gemeay et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data are within the manuscript.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
In probability distribution theory, choosing a particular probability distribution for modeling real-life phenomena could depend on whether the distribution is flexible. The tractability of a probability distribution might be helpful in theory because such distribution would be easy to work with, particularly with regard randomly generated data of different random samples. Still, the flexibility of probability distributions could be of interest to experts. It is preferable to use probability distributions that best fit the available data set rather than transform the existing one. Therefore, many attempts have been made lately to guarantee that the standard theoretical distributions are changed and developed. This could build their adaptability and increase their capacity to model real-life data sets.
Different methods could be put into use to expand the current standard distribution. For example, the flexibility of a distribution can be increased through generalization, which involves using the accessible generalized family of distributions. When a distribution is generalized, extra shape parameter(s) from the family of distributions utilized would have been added. The job of these additional shape parameter(s) is to change the tail weight of the resulting compound distribution, thereby inducing it with skewness. Generalizing classical distributions is an ancient practice, as important as many other practical problems in statistics. These generalizations introduced additional location, scale, or shape parameters to the original model. This branch of statistics has received considerable attention. Many general distribution classes have been derived in recent years. Azzalini [1] introduced the skew-normal distribution by adding an extra parameter to the normal distribution to increase flexibility. Mudholkar and Srivastava [2] proposed a technique for adding an extra parameter to a two-parameter Weibull distribution. Marshall and Olkin [3] introduced another method for adding a parameter to expand a family of distributions. Eugene et al. [4] introduced the beta generalized family of distributions. It was derived from the log-it of the beta random variable, and it has two extra shape parameters. Cordeiro and de Castro [5] created and studied another family of generalized distributions dependent on the Kumaraswamy distribution. Zografos and Balakrishnan [6] presented a gamma generalized family of distributions. Type II of gamma generalized family of distributions presented by Ristic and Balakrishnan [7]. The McDonald generalized family of distribution was generated from the McDonald random variables by Alexander et al. [8]. Alzaatreh et al. [9] presented a new method of generating families of continuous distributions called the T-X family. Mahdavi and Kundu [10] introduced a new method for deriving statistical distributions. For more surveys about methods of generating distributions, see Lee et al. [11], Jones [12], and Ahmad et al. [13].
A more adaptable distribution that works well in various circumstances is still required, even though several distributions in the literature may be used to evaluate data in many domains. The main aim of this paper is to propose a new family of generalized distributions. The family’s many mathematical properties are investigated, and several methods are used for parameter estimates. A new statistical model is derived using the exponential distribution as a baseline for our proposed family. The exponential distribution’s additional parameters changed the tail length and created uneven densities to the right and left. The behavior of new model estimators was examined via simulation, and all estimation methods showed the consistency property for all measures. Three real data sets were used to demonstrate the suggested model’s applicability compared to the other models.
The article is prepared as follows: The proposed family is presented in Section 2, and Section 3 demonstrates its statistical features, including the quantile function, moments, the moment generating function, incomplete moments, and the Rényi entropy. The maximum likelihood estimation (MLE), ordinary Least-Squares estimation (OLSE), weighted-least Squares estimation (WLSE), Anderson-Darling estimation (ADE), right-tailed Anderson-Darling estimation (RTADE), Cramér-von Mises estimation (CME) and maximum product of spacing (MPSE) techniques are described in Section 4 for estimating the proposed family parameters. Section 5 defines the extended Modi exponential distribution (ExMED). Estimating the ExMED parameters using simulation results in section 6. In Section 7, three real data sets illustrate the performance of the ExMED distribution. Finally, some final thoughts are offered in Section 8.
2 Family formulation
Modi et al. [14] presented a new Modi family of probability distributions. Their CDF and PDF are given by
respectively, where x, α, β > 0. They studied the statistical properties of Modi exponential distribution and applied them to two real data sets.
This section presents a relatively new family of generalized distributions called the extended Modi-G family of distributions. Its CDF is given by
(1)
where x ∈ R, α, θ > 0. Its PDF is defined as the following
(2)
where G(x; ϕ) is the CDF of the baseline distribution.
Survival function (SF), hazard rate function (HRF), and reverse hazard rate function (RHRF) of the extended Modi-G family are given by (3)–(5), respectively.
(3)
(4)
(5)
3 Mathematical properties
This section provides some mathematical properties of the extended Modi-G family of distributions, such as linear representation, quantile function, moments, moment-generating function, incomplete moments, and entropy.
3.1 Linear representation
A useful linear representation of CDF (1) and PDF (2) is introduced in this subsection. For −1 ≤ x ≤ 1, we have
(6)
by applying the power series (6) in (1), we have
(7)
(8)
3.2 Quantile function
Let X be a random variable with CDF (1), then the quantile function (QF) of X is defined as the following
(9)
where G−1 is the QF of the baseline distribution and u ∈ (0, 1). By setting u = 0.25, 0.5, 0.75, we have first, second, and third quartiles, respectively, which are used to determine Bowley’s skewness (BS) [15] as the following
(10)
and Moor’s kurtosis (MK) [16] as the following
(11)
3.3 Moments and moment generating function
Moments play an important role in statistical analysis, especially in applications. The rth moments of the extended Modi-G family of distribution are defined as the following
(12)
using (8) in (12), we have
(13)
The moment generating function (MGF) of extended Modi-G random variable X is given by
(14)
by replacing t by it, we have the characteristic function of the extended Modi-G family of distribution.
3.4 Incomplete moments
Let X be a random variable with PDF (2), then the rth incomplete moments of it is given as the following
The first incomplete moment of X is given by
(15)
by using Eq (15), we have Lorenz, Bonferroni, and Zenga curves [17], respectively, as the following
where F(xp; α, θ, ϕ) = p.
Also, by using Eqs (1) and (15), we can determine the mean residual life (MRL) and the mean inactivity time (MIT), respectively, as the following
3.5 Entropy
The entropy of a random variable X determines the randomness found in a probability distribution, and different types of entropies are not similarly useful for all applications.
Let X be a continuous random variable with, then Rényi entropy [18] is given by
(16)
From PDF (2), we have
where
then
(17)
From Eq (17) in Eq (16), we have Rényi of the extended Modi-G family as the following
(18)
Let X be a continuous random variable from the extended Modi-G family, then Tsallis entropy is given by
Let X be a continuous random variable from the extended Modi-G family, then Shannon entropy [19] is given by
4 Estimation methods
In this section, we introduce a different method for estimating unknown parameters of the extended Modi-G family of distributions, such as maximum likelihood estimation (MLE), ordinary Least-Squares estimation (OLSE), weighted-Least Squares estimation (WLSE), Anderson-Darling estimation (ADE), right-tailed Anderson-Darling estimation (RTADE), Cramér-von Mises estimation (CME), and maximum product of spacing (MPSE).
4.1 Maximum likelihood estimation
It is the most common method used for estimating unknown parameters (for more details, see [20]). Let X1, X2,…Xn be a random sample with PDF (2), then the log-likelihood function is given by
(19)
By derivative Eq (19) to its parameters and equating the result equations to zero will provide us with the requested estimates. These derivatives are determined as follows
The second derivative of each parameter is determined to construct the proposed model Hessian matrix as the following
Now, after determining the inverse of the Hessian matrix, we obtained the covariance matrix of our estimators. By calculating its diagonal square root, we obtained the standard errors of our estimators.
4.2 Ordinary Least-Squares and Weighted-Least Squares estimation
Let x1:n, x2:n, …, xn:n be the order statistics of a random sample of size n from the extended Modi-G family of distributions, where
(20)
estimates can be obtained by solving simultaneously the three non-linear equations obtained from minimizing Eq (20) to its parameters (for further detail, see [21]).
Similarly, WLSE is determined by minimizing the following equation
4.3 Anderson-Darling and right-tail Anderson-Darling estimation
The ADE of unknown parameters of the extended Modi-G family of distributions are obtained by minimizing the following equation
Similarly, the RTADE of parameters can be calculated by minimizing the following equation
4.4 Cramér-von Mises estimation
The CVME of unknown parameters of the extended Modi-G family of distributions are obtained by minimizing the following equation(for more details, see [22])
4.5 Maximum product of spacings estimation
The maximum product of spacings (MPSE) [23] method is used to estimate the parameters of continuous univariate models as an alternative to the ML method. The uniform spacings of a random sample of size n from the extended Modi-G family can be defined by
where Di denotes to the uniform spacings, F(x0) = 0, F(xn+1 = 1) and
. MPS estimators of parameters can be obtained by maximizing
5 A special sub-model
In this section, we defined a two-parameter sub-model of the proposed family by taking the CDF of the baseline distribution following the exponential distribution, which is called extended Modi exponential distribution (ExMED). Then, CDF and PDF of ExMED are given, respectively, by
(21)
(22)
Its SF, HRZ, and RHRF are, respectively, given by the following relation
(23)
(24)
(25)
Figs 1 and 2 display the PDF and HRF plots of ExMED, respectively. As these figures demonstrate, the ExMED can handle decreasing, decreasing-constant, increasing, and increasing constant hazard rate functions. In addition, some densities are symmetrical, left-skewed, right-skewed, J-shaped, and reversed-J-shaped.
The quantile function of ExMED is given by
(26)
Let u ∼ uniform (0, 1). Then, using the ExMED’s QF, one may use the formula to produce random data sets of size n from this distribution.
(27)
By Eqs (10), (11) and (26), we can determine BS and MK for ExMED, respectively. Fig 3 shows the skewness and kurtosis plots of the ExMED model for b = 2, along with several parametric values of α and θ.
6 Simulation results of estimation methods
We explore the performance of the aforementioned estimation methods in estimating the ExME parameters using simulation results. We consider various sample sizes, n = 20, 70, 100, 250, 1000, and various parametric values. We generate n = 1000 random samples from the ExMED and determine the average estimates (AESTs), the average absolute biases (ABs), average mean square error (AMSEs), and average mean relative estimates (AMREs) for all sample sizes and parameter combinations using the R software©.
The following respective equations can calculate the AESTs, ABs, AMSEs, and AMREs:
where θ = (α, θ, b)′.
The results of the simulation study, including AESTs, ABs, AMSEs, and AMREs, were reported in Tables 1–4. The row indicating ∑ Ranks gives the partial sum of the ranks of ABs, AMSEs, and AMREs. A superscript indicates the rank of each of the estimators among all the estimators for that metric.
The following observations can be drawn from Tables 1–4.
- All the estimators reveal the consistency property, i.e., the MSE decreases when the sample size increases.
- ABs of all estimates decrease when n increases for all estimation methods.
- AMREs of all estimates decrease when n increases for all estimation methods.
- In terms of the performance of the estimation methods, we found that the MPSE estimates are the best estimators as they produce the least biases, and MSE with the least MRE for most of the configurations considered in our study. The next best estimators are the MLE estimates, followed by the ADE. The overall positions of the estimators are presented in Table 5, from which we can confirm the superiority of MPSE. In summary, based on Table 5, the performance ordering of estimators from best to worst for all parameter combinations is MPSE, MLE, ADE, WLSE, RTADE, CVME, and LSE.
7 Data analysis
In this section, we use three real data sets from the fields of medicine and geology to explain the superiority of the proposed model in fitting these data sets over other related models. The first data was set about the remission times (in months) of a random sample of 128 bladder cancer patients, which was introduced in Lee and Wang [24]. The second data set consists of measurements made on patients with malignant melanoma. Each patient had their tumor removed by surgery at the Department of Plastic Surgery, University Hospital of Odense, Denmark, from 1962 to 1977. It consists of 7 variables, each with 205 observations; we studied the sixth variable (thickness: Tumour thickness in mm). It was obtained from Andersen et al. [25]. The third data set gives peak accelerations measured at various observation stations for 23 earthquakes in California and is referred to in [26] by Joyner and Boore. It consists of 5 variables and each variable consists of 182 observations, we studied the fourth variable [dist: numeric Station-hypocenter distance (km)]. The numerical values of data sets are given in Tables 6–8, respectively.
We compare the proposed distribution with some other well-known and related competing distributions, including Modi exponential distribution (MED) [14], modified kies exponential distribution (MKED) [27], alpha power exponential distribution (APED) [10], exponential distribution (ED), exponentiated exponential distribution (ExED), generalized log-logistic exponential distribution (GLLED) [28], linear exponential distribution (LNED) [29], logistic exponential distribution (LED) [30], Marshall Olkin exponential distribution (MOED) [3], Nadarajah Haghighi exponential distribution (NHED) [31], odd exponentiated half logistic exponential distribution (OExHLED) [32], odd inverse Pareto exponential distribution (OIPRED) [33], transmuted exponential distribution (TED) [29] and transmuted generalized exponential distribution (TGED) [34] distribution.
The comparison models can be compared using some discrimination measures such as the Akaike information criterion (AKIC), consistent Akaike information criterion (CAKIC), and Hannan–Quinn information (HAQUIC) criterion. Further discrimination measures include Anderson Darling (AD), Cramér–von Mises (CV), and Kolmogorov–Smirnov (KS) with its p-value (KSPV).
The MLEs and the analytical measures are computed using the Wolfram Mathematica version 12.0. Tables 9–14 give analytical measures along with the MLEs and their standard errors for the three data sets, respectively. The results in these tables indicate that the ExMED provides better fits than other competing models and could be chosen as an adequate model to analyze medicine (cancer) and geology (earthquakes) data sets.
The fitted PDF, CDF, SF, and probability-probability (P–P) plots of the ExMED for the three data sets are shown in Fig 4, respectively. Furthermore, we use the seven estimation approaches discussed in Section 4 to estimate the ExMED parameters. Table 15 reports the estimates of the ExMED parameters using these approaches and the numerical values of estimated parameters and negative log-likelihood along with goodness-of-fit for three data sets, respectively. Based on the values of KS and KSP listed in Table 15, we conclude that the seven estimation methods perform well in fitting the three data sets. The P–P plots and histogram of three data sets with the fitted ExMED density for various estimation methods are, respectively, shown in Figs 5–7 that support the results in Table 15. Fig 8 provides the TTT plots and plots of the estimates HRF of ExMED for the three data sets, respectively. They reveal that the ExME HRFs have unimodal shapes respectively. This fact agrees with the TTT plot based on each data set. The proposed model’s estimated parameters’ existence and uniqueness are shown graphically in Fig 9 for the three data sets. These estimated parameters were calculated using the NMaximize function in Wolfram Mathematica software version 12.0, which always attempts to find a global maximum of the objective function subject to the constraints given. Also, in Fig 10, we show that these estimated parameters maximize the log-likelihood function for the three data sets.
8 Conclusion
A new family of life distributions called the extended Modi-G family is presented, and general expressions for some mathematical statistics properties of the new family, including quantile function, moments, moment generating function, incomplete moments, inequality curves, Rényi entropy, and Shannon entropy are derived. The maximum likelihood, ordinary least square, weighted least square, Anderson Darling, right-tailed Anderson-Darling, Cramér-von Mises, and maximum product of spacing methods were discussed to estimate the model parameters. A special sub-model called extended Modi exponential distribution was derived. Its density function can be symmetric, left-skewed, right-skewed, increasing, J-shape, and inverse J-shape along with upside-down bathtub, decreasing, decreasing-constant, increasing, and increasing constant hazard rate functions. Different data sets were analyzed, and the superiority of the extended Modi exponential distribution for fitting data sets was illustrated over other compared distributions.
References
- 1. Azzalini A. A class of distributions which includes the normal ones. Scandinavian journal of statistics. 1985 Jan 1:171–8.
- 2. Mudholkar GS, Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE transactions on reliability. 1993 Jun;42(2):299–302.
- 3. Marshall AW, Olkin I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika. 1997 Sep 1;84(3):641–52.
- 4. Eugene N, Lee C, Famoye F. Beta-normal distribution and its applications. Communications in Statistics-Theory and methods. 2002 May 14;31(4):497–512.
- 5. Cordeiro GM, De Castro M. A new family of generalized distributions. Journal of statistical computation and simulation. 2011 Jul 1;81(7):883–98.
- 6. Zografos K, Balakrishnan N. On families of beta-and generalized gamma-generated distributions and associated inference. Statistical methodology. 2009 Jul 1;6(4):344–62.
- 7. Ristić MM, Balakrishnan N. The gamma-exponentiated exponential distribution. Journal of statistical computation and simulation. 2012 Aug 1;82(8):1191–206.
- 8. Alexander C, Cordeiro GM, Ortega EM, Sarabia JM. Generalized beta-generated distributions. Computational Statistics & Data Analysis. 2012 Jun 1;56(6):1880–97.
- 9. Alzaatreh A, Lee C, Famoye F. A new method for generating families of continuous distributions. Metron. 2013 Jun;71(1):63–79.
- 10. Mahdavi A, Kundu D. A new method for generating distributions with an application to exponential distribution. Communications in Statistics-Theory and Methods. 2017 Jul 3;46(13):6543–57.
- 11. Lee C, Famoye F, Alzaatreh AY. Methods for generating families of univariate continuous distributions in the recent decades. Wiley Interdisciplinary Reviews: Computational Statistics. 2013 May;5(3):219–38.
- 12. Jones MC. On families of distributions with shape parameters. International Statistical Review. 2015 Aug;83(2):175–92.
- 13. Ahmad Z, Hamedani GG, Butt NS. Recent developments in distribution theory: a brief survey and some new generalized classes of distributions. Pakistan Journal of Statistics and Operation Research. 2019 Mar 23:87–110.
- 14. Modi K, Kumar D, Singh Y. A new family of distribution with application on two real datasets on survival problem. Science & Technology Asia. 2020 Mar 26:1–0.
- 15. Bowley AL. Elements of statistics. King; 1926.
- 16. Moors JJ. A quantile alternative for kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician). 1988 Mar;37(1):25–32.
- 17. Arcagni A, Porro F. The graphical representation of the inequality. Revista Colombiana de estadistica. 2014 37(2):419–437.
- 18.
Rényi A. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics 1961 Jan 1 (Vol. 4, pp. 547–562). University of California Press.
- 19. Shannon CE. Prediction and entropy of printed English. Bell system technical journal. 1951 Jan;30(1):50–64.
- 20. Evans JS, Brooks PG, Pollard P. Prior beliefs and statistical inference. British Journal of Psychology. 1985 Nov;76(4):469–77.
- 21. Swain JJ, Venkatraman S, Wilson JR. Least-squares estimation of distribution functions in Johnson’s translation system. Journal of Statistical Computation and Simulation. 1988 Jun 1;29(4):271–97.
- 22. Macdonald PD. Comments and queries comment on “an estimation procedure for mixtures of distributions” by choi and bulgren. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1971 Jul;33(2):326–9.
- 23. Cheng RC, Amin NA. Maximum product-of-spacings estimation with applications to the lognormal distribution. Math report. 1979;791.
- 24.
Lee ET, Wang J. Statistical methods for survival data analysis. John Wiley & Sons; 2003 Aug 1.
- 25.
Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer Science & Business Media; 2012 Dec 6.
- 26. Joyner WB, Boore DM. Peak horizontal acceleration and velocity from strong-motion records including records from the 1979 Imperial Valley, California, earthquake. Bulletin of the seismological Society of America. 1981 Dec 1;71(6):2011–38.
- 27. Al-Babtain AA, Shakhatreh MK, Nassar M, Afify AZ. A new modified Kies family: Properties, estimation under complete and type-II censored samples, and engineering applications. Mathematics. 2020 Aug 12;8(8):1345.
- 28. Afify AZ, Suzuki AK, Zhang C, Nassar M. On three-parameter exponential distribution: properties, Bayesian and non-Bayesian estimation based on complete and censored samples. Communications in Statistics-Simulation and Computation. 2021 Nov 2;50(11):3799–819.
- 29. Tian Y, Tian M, Zhu Q. Transmuted linear exponential distribution: A new generalization of the linear exponential distribution. Communications in Statistics-Simulation and Computation. 2014 Nov 26;43(10):2661–77.
- 30. Lan Y, Leemis LM. The logistic–exponential survival distribution. Naval Research Logistics (NRL). 2008 Apr;55(3):252–64.
- 31. Nadarajah S, Haghighi F. An extension of the exponential distribution. Statistics. 2011 Dec 1;45(6):543–58.
- 32. Afify AZ, Zayed M, Ahsanullah M. The extended exponential distribution and its applications. Journal of Statistical Theory and Applications. 2018 Jun;17(2):213–29.
- 33. Aldahlan MA, Afify AZ, Ahmed AH. The odd inverse Pareto-G class: Properties and applications. J. Nonlinear Sci. Appl. 2019 May 1;12:278–90.
- 34. Khan MS, King R, Hudson IL. Transmuted generalized exponential distribution: A generalization of the exponential distribution with applications to survival data. Communications in Statistics-Simulation and Computation. 2017 Jul 3;46(6):4377–98.