Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimating q−GEVL distribution parameters under Type II progressive censoring using particle swarm optimization

Abstract

In this article, the effect of the parameters in the properties of a well-known distribution called q-extended extreme value with linear normalization is discussed. Moreover, these parameters are estimated by both maximum likelihood and Bayesian approaches using type-II progressive censoring. The removals of type-II progressive censoring are considered under three well-known random removals (Fixed, discrete uniform, and binomial). Finding effective numerical techniques is a typical challenge for statisticians when estimating MLE parameters for distributions with many parameters. So one of our aims in this article is to show how the metaheuristic optimization like the particle swarm optimization, can handle this problem. Furthermore, the interval estimation for the parameters is calculated using the Fisher information matrix. The Bayesian approach is utilized for both the informative and non-informative under two different loss functions (square error and Linex loss functions) using Lindley’s approximation. Moreover, home price data in California represent a good fit for the q-extended extreme value distribution with linear normalization. By using this fitting some of California’s future home prices are predicted by using the return level function.

1 Introduction

Extreme value theory (EVT) is a very interesting essential branch of statistics for modeling the behavior of extreme events. EVT has many applications, such as insurance, cybersecurity, environmental science, and other fields. [1] provided a good insight into the importance of EVT in these areas. One of the most famous distributions on EVT is the generalized extreme value under linear normalization () distribution. The distribution is considered a continuous probability distribution that models the maximum (or lowest) of a group of independent, identically distributed random variables. The distribution has three parameters: location, scale, and shape. These parameters determine tail heaviness and skewness, which are summarized by [2]. Many researchers used in their investigation as [35]. In statistical theory, new standard distributions are currently widely used. Usually, generators are used to add a new parameter and combine preexisting distributions to create a new family see [612]. The concept of q-analogs is used in many different kinds of mathematical and statistical contexts, giving a framework for generalizing classical concepts and structures by introducing a new parameter q. The use of q- analogs in probability and statistics enables the investigation of a wider range of distributions and features that can be reduced to their initial distribution when . Here in this article, we considered the q-generalized extreme value under linear normalization () distribution. The distribution is an extension of the distribution that incorporates a q parameter, providing greater flexibility for modeling as illustrated by [13]. Many other articles considered the q-analog of their work as [1419].

In statistical reliability and survival analysis, censoring describes instances in which recognizing the failure of all units may be impossible due to time, cost, or other aspects. Sometimes, it occurs because the observation period expires before all events (such as failures or fatal crashes) have occurred. To handle this dilemma, the researcher used censoring schemes. Censoring schemes are classified into several categories, including Type I, Type II, and type II progressive censoring. In Type I (II) censoring schemes, the experiment terminated at a predetermined time (number of units). While type-II progressively censored samples allow researchers to remove units during the experiment. So in this paper, we compare the behavior of different types of estimators considered under type II progressive censoring for previously determined elimination units in each stage of type II progressive censoring (fixed removal) compared to unknown number of elimination units (Discrete uniform and binomial removal). The [20] provided how to employ type II progressive censoring in your code. Many authors used type II progressive censoring in their contribution as [2123].

This article explores the estimation of parameters for the distribution employing and the bayesian estimation for a type-II progressively censored sample. Since the resulting equations of are not easy to solve by classical techniques like Newton. We utilize the particle swarm optimization () Algorithm to handle the difficulty of .

is considered One of the most popular algorithms in optimization. which is defined as a computational technique for solving optimization problems. It is inspired by the social acts of birds swarming or fish swarming. It is a population-based stochastic optimization technique that simulations particle movements (possible solutions) in a search space to determine the best solution to a problem. Each particle in the swarm represents a possible solution, which advances through the search space by following the swarm’s current best solution. Particles eventually converge on the optimum answer based on their own and their neighbors’ experiences see, [24] for more details. Their are many advantages of such as, effectiveness for Multi-Modal Problems, fast Convergence, generally robust, less sensitive to parameter and it does not require gradient information.

The aims of the proposed article can be given as following:

  • parameters estimation of under Type II censoring schemes using both and .
  • introducing optimization techniques for MLE.
  • Trying to use the fitting to predict some of the future measurements.

The article is organized as follows: In Sect 2, we investigate the properties of the distribution. Sect 3, we outlines the three cases under investigation and investigated of parameters based on type-II progressively censored samples for the three cases of removals. Moreover, approximate confidence intervals are utilize. In Sect 4, for both non-informative and informative priors using two different loss functions. In Sect 5, a simulation is purposed to as an application of the theoretical parts in this paper. Sect 6 contains the real data example. Finally, Sect 7 summarizes the article’s findings.

2 Distribution properties

This section discusses the effect of the parameter q on the behavior of the distribution, the probability density function, PDF , and the cumulative density function , CDF. Moreover, the distribution properties such as the quantile function, return level, reversed hazard rate, hazard rate, moments, skewness and kurtosis and the moment generating function.

2.1 PDF and CDF

Since, PDF and CDF of the according to [13] are given by

(1)

and

(2)

where,

(3)

and

(4)

to investigate the impact of the parameters on the shapes of the distribution both CDF and PDF are plotted in Fig 1 using different sets of parameters.

thumbnail
Fig 1. The plots of distribution CDF and PDF.

(a) (, , ), (b) (q = 1.3, , ), (c) (, ) and .

https://doi.org/10.1371/journal.pone.0323897.g001

Fig 1 shows,

  • The parameter combinations strongly influence the shapes of both the CDF and PDF.
  • The CDF graphs reflect distributions of probability accumulates continuously as x increases. The changes in these curves indicate how probability dependent on parameter values.
  • The PDF variations represent deviations in the distribution’s skewness and kurtosis, with some plots showing sharp peaks and heavy tails and others being more spread out.
  • The parameter controls the tail behavior, while q controls the overall height and dispersion of the curves.

2.2 Moments

Both moments () and Moment Generating Function () are essential concepts in statistics and probability theory that characterize the probability distributions. Moreover, they are important features in theoretical and practical situations since it aids in summarizing and evaluating data distribution. The indicate many characteristics of a random variable, including as its central tendency, shape, variability, and tail behavior. While the can be used to derive all moments of a random variable (such as the mean, variance, skewness, and higher-order moments).

Let X be any random variable (RV) follows then the moments and are

Then

(5)(6)

where the support of x given in Eq 4.

It’s clear from Eqs 5–(6) that these equations cannot be reduced to a closed form. So, the numerical software is used to evaluate these integrations using specified parameter values;

see Table 1.

thumbnail
Table 1. The statistical properties of and at and respectively.

https://doi.org/10.1371/journal.pone.0323897.t001

2.3 Quantile function, skewness, and kurtosis

The quantile function () is a fundamental concept in statistics that explains the connection between probabilities and the corresponding values for a random variable. It is essentially the inverse of CDF and is used to determine the minimum value at which a particular proportion of the data falls. According to [13] the quantile function is

Since the Skewness details the asymmetry or lack of symmetry in the data. While kurtosis , helps measure the presence of outliers by assessing the heaviness of tails and the sharpness of the peak. The commonly used formals of and can be obtained from the distribution’s moments as in the formula 2 in Eqs 7 and (8), specifically the third moment (for skewness) and the fourth moment (for kurtosis), or by depending on the quantile function as given by the formula of Bowley’s Skewness and Moors’ kurtosis as in the formula 1 in Eqs 7 and (8). Where Bowley’s Skewness is a quantile-based skewness measurement that quantifies a distribution’s skew by employing Quartiles. While Moors’ kurtosis is a quantile-based measure of kurtosis, which provides a strong alternative to classic moment-based kurtosis. Using Octiles minimizes sensitivity to extreme outliers and offers a more consistent approximation of a distribution’s tails. Then and are given as

(7)(8)

Then and , could be easily obtained as

  1. For
  2. For

Fig 2 are purposed for a better understanding of the effect of parameters q and on the behavior of the and

thumbnail
Fig 2. The plots of for the and at for different values of q and .

https://doi.org/10.1371/journal.pone.0323897.g002

From Fig 2:

  1. and change non-linearly with respect to the parameters and q, where the scaling parameter affecting the distribution properties. Since a larger scale parameter results in a wider spread and contrary a smaller scale parameter results in a narrower distribution.
  2. The increaseing sharply for larger values of , particularly for positive , indicating fatter tails in the distribution.
  3. The tail behaviour of is effected by the values of , where positive leads to positive with longer right tail where as negative leads to negative and longer left tail.

2.4 The return level

The return level is an important concept in the study of extreme value theory (EVT), particularly in applications such as hydrology, climate science, finance, and engineering, where we want to evaluate the size of an extreme event that is predicted to occur over a given time period. Let T be the return period, then mathematically, the return level is given by . Then . Then

After some calculation, it is easy to conduct,

Table 1 demonstrate the values of for some parameter values.

From Table 1

  • For GEVL, the median and quartiles are higher when compared to .
  • For , the median and quartiles are significantly lower when , indicating a shift in central tendency.
  • The variance remains relatively small for all cases, with compared to GEVL.
  • For GEVL, Zt values remain relatively consistent across different values of . While for , the return levels are slightly higher compared to GEVL, suggesting heavier tails and greater tail dependence.
  • GEVL exhibits negative skewness, indicating a left-skewed distribution, with a kurtosis, suggesting a platykurtic nature. While exhibits larger absolute skewness, implying a stronger right-skew. Then distribution modifies the tail behavior of GEVL, making it more sensitive to extreme values and left-skewed distributions
  • In the case of the return levels (Zt) under these parameters, there is no noticeable effect between the two distributions.
  • The result conducted from Table 1 is compatible with the result of real data in Sect 6 demonstrated in Table 9.

The result in Table 1, is particularly useful for scenarios where modeling extreme lower-end behavior as financial risk and environmental extremes are critical.

2.5 The hazard rate and reversed hazard rate

The hazard rate () and reversed hazard rate () are two essential concepts in reliability theory and survival analysis. They describe several features of a mechanism’s or individual’s long-term failure or survival. Since the and for distribution are and respectively. By using Eqs 1–(2), it’s easy to get

and

In Fig 3 we display some plots of and for some parameter values.

thumbnail
Fig 3. The plots of distribution and .

(a) (, , ), (b) (q = 0.9, , ), (c) (, ) and .

https://doi.org/10.1371/journal.pone.0323897.g003

From Fig 3, it’s clear that the hazard rate can rise or reduce periodically, depending on the parameter values, reflecting the various risks of systems or occurrences. In most circumstances, the decreases, with small values of . While shows increasing for the small value of .

3 of under progressive Type-II Censoring with different types removals

In this section, the of parameters using is displayed for both point and interval estimation under three cases of type II progressive censoring (Fixed, Discrete uniform, Binomial) random removal. The interval estimation of parameters is conducted using the Fisher information matrix.

3.1 Types removals of Type II progressive censoring

Let be vector of type II progressive censoring scheme. Then joint PDF of is

(9)

Where, 0<r1<Nn, , for , . For more details on how to obtain Eq 9 see, [25].

3.2 MLE of distribution parameters

Let X be a random variable following distribution. Then the joint likelihood function based on Type II progressive censored data is given by . Then

where

(10)

where, S is defined at Eq 3 and . By using Eqs (1–(2) in Eq (10), we obtain

(11)

then the MLE of parameters of are,

    1. i. (12)
    2. ii. (13)
    3. iii. (14)
    4. iv. (15)
    1. i. (16)
    2. ii. (17)
    3. iii. (18)

where

Since the Eqs 12–(15) and Eqs 16)–(18) are very difficult and challenging to solve analytically and numerically by the well known traditional methods, the is suggested to handeled this probem. Statisticians and researchers can easily incorporate it into their code by following the steps outlined in ([26]).

For the implementation of follow these steps:

  1. Define the log-likelihood function, which called Fitness function.
  2. Specify the search interval for each parameter.
  3. Execute the optimization using the (psoptim) function in R.

3.3 Fisher Information Matrix ()

The Fisher Information Matrix () is a key principle in statistical estimation and information theory that helps in the evaluation of the parameters’ confidence interval. is equal to the matrix of the negative values of the second partial derivatives of the logarithm of the joint likelihood function given at Eq 10. Then,

where, S is given at Eq 3, and is the MLE of . Since the variance-covariance matrix of the parameters is the inverse of .

Hance the parameter is normally distributied with mean and variance . Then the confidence interval of the parameter is given by .

4 Bayesian estimation

In this section we considered the parameters Bayesian estimation () of for both informative and non-informative cases under the three cases of Type II progressive censoring discussed above using two different loss functions square error loss function (sq) and linex loss function (lx).

Informative

For this case suppose that, all the parameters in S are following exponential distribution with different hyperparameters (b1–b4). While, the prior PDF of P follows beta distribution with hyperparameters c,d. Then the joint prior PDF could be written as

the method of selecting the PDF prior is purposed in [25] and used at [27]. Then the joint posterior PDF is

where, L(q,S) given at Eq 10.

Non-informative

For this case suppose that all the parameters follow a uniform distribution with interval [0,1]. Then the joint prior PDF is equal 1. Then the joint posterior PDF is proportional with L given at Eq 10. Let , then for both cases of the using LINEX (lx) and square error loss function (sq) is given by

(19)

where the expectation in Eq 19 is taking with the respect of the joint posterior PDF. For more details on loss functions given at Eq 19, see [25], which is mentioned in detail. Since this expectation in Eq 19 cannot be reduced analytically, Lindley’s approximation is used; see [28].

5 Simulation

In this section, a Monte Carlo simulation is used to evaluate the productivity of numerous parameter estimators that have been mentioned earlier assuming the real values of parameters are (0.1,0.5,0.5,0.2). To test the sensitivity of claimed estimator techniques to sample size, datasets of 1000 samples are produced using different sample sizes . Further, Lindley’s approximation technique is used to compute for both sq and lx loss functions for three different values of the parameter , namely {-1, 1, 0.5} with informative (inf)and non-informative (non) priors. The estimators are thoroughly tested utilizing Bias and mean squared error (MSE) analysis. Additionally, a confidence interval for the parameters is determined. Tables 2, 3, 4, and 5 display the Bias and MSE of multiple estimators for the parameter under three distinct scenarios of removals of type II progressive censoring. Moreover, the results of the confidence interval for these parameters are proposed in terms of upper bound( ), lower bound( ) and length of the interval bound ( ).

thumbnail
Table 5. The result of the confidence interval for the parameter estimation.

https://doi.org/10.1371/journal.pone.0323897.t005

From Tables 2, 3, 4, and 5

  1. The consistently performs well in both Bias and MSE across all sample sizes, illustrating one of the proposed algorithm’s main advantages.
  2. The Bayesian estimators change according to the sample size and distribution, demonstrating significant improvement as the sample size grows.
  3. The lx(0.5) estimator has the largest MSE and Bias, especially for small sample sizes.
  4. The Lindley approach is especially successful with larger sample sizes, as suggested by [28]. Nevertheless, it works well with samples of 100 or more but is inappropriate for the proposed distribution when sample sizes are less than 50.
  5. The result of the confidence interval demonstrates improvement with the sample size increase as a decrease of the size of the interval.

6 Real data

Since investigating real estate prices is important due to it’s affecting wealth distribution, economic stability, and personal finances. So, in the present paper, a data set of size 297 representing the home price data in California from 2020 until now is considered. For the data source see [29] and provided at Table 6.

These data set is fitted to both and and the results obtained into Table 7 and Fig 4 show that both GEVL and are giving a good fit for this data using different types of fitting measures.

thumbnail
Fig 4. The CDF and Empirical Distribution for both and q- for home price in California.

https://doi.org/10.1371/journal.pone.0323897.g004

thumbnail
Table 7. The different fitting measures for home price in California.

https://doi.org/10.1371/journal.pone.0323897.t007

More over, a summary of statistics and statistical properties of this data set are provided in both Tables 8 and 9.

From Tables 8 and 9, and Fig 5:

  • Both distributions have positive skewness, indicating a slight right tail (more extreme high values than low values).
  • Both distributions have kurtosis values greater than 1. Then both distributions exhibit a slightly heavier tail than a normal distribution (platykurtic).
  • The model predicts higher return levels compared to GEVL, especially for larger return periods (20,30) years which. This is indicated to predict more extreme home price values over time, which is compatible with the results in Table 1.

thumbnail
Fig 5. The and for home price of real state at California.

https://doi.org/10.1371/journal.pone.0323897.g005

thumbnail
Table 8. Summary of the Statistics for Home Price in California.

https://doi.org/10.1371/journal.pone.0323897.t008

7 Conclusion

In this paper, the properties of q-extended extreme value distribution with linear normalization are displayed showing that is particularly useful for scenarios where modeling extreme lower-end behavior as financial risk and environmental extremes are critical. Then the Particle Swarm Optimization () for estimating the of parameters is used. Both and under type-II progressive censoring schemes (fixed, binomial, and discrete uniform random removal distributions) are conducted for parameters. Both point and interval estimation are considered for parameters . While Lindley’s approximation was used in Bayesian estimation for both informative and non-informative priors, under the square error and LINEX loss functions. Moreover, simulation is presented showing that consistently produces outstanding results in both Bias and MSE for different sample sizes, highlighting the strengths of the suggested algorithm. Contrary the performance of Bayesian estimators fluctuates according to sample size and distribution, with substantial improvements as sample sizes increase. The lx(0.5) estimator provides the highest MSE and Bias, especially if the sample size is small. The Lindley method’s performs well for sample sizes of 100 or more but fails to perform for the suggested distribution when the sample size is less than 50. Then an example of real home price in California is considered as one of the applications on showing that gives to predict more extreme home price values over time, which is compatible with the results in Table 1.

References

  1. 1. Gilli M, këllezi E. An application of extreme value theory for measuring financial risk. Comput Econ. 2006;27(2–3):207–28.
  2. 2. Fisher RA, Tippett LHC. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Math Proc Camb Phil Soc. 1928;24(2):180–90.
  3. 3. Bali TG. The generalized extreme value distribution. Econ Lett. 2003;79(3):423–7.
  4. 4. Bertin E, Clusel M. Generalized extreme value statistics and sum of correlated variables. J Phys A: Math Gen. 2006;39(24):7607–19.
  5. 5. Hosking JRM, Wallis JR, Wood EF. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics. 1985;27(3):251–61.
  6. 6. Bleed SO, Attwa RA-E, Ali RFM, Radwan T. On alpha power transformation generalized pareto distribution and some properties. J Appl Math. 2024;2024(1):6270350.
  7. 7. Attwa RA-E, Radwan T, Zaid EOA. Bivariate q-extended Weibull morgenstern family and correlation coefficient formulas for some of its sub-models. MATH. 2023;8(11):25325–42.
  8. 8. Attwa R, Zaid E. Record values from the Gumbel and q-Gumbel distributions with applications. Thailand Stat. 2024;22(4):750–68.
  9. 9. Dey S, Sharma VK, Mesfioui M. A new extension of weibull distribution with application to lifetime data. Ann Data Sci. 2017;4(1):31–61.
  10. 10. Mudholkar GS, Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Rel. 1993;42(2):299–302.
  11. 11. Marshall A, Olkin I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika. 1997;84(3):641–52.
  12. 12. Eugene N, Lee C, Famoye F. Beta-normal distribution and its applications. Commun Statist Theory Methods. 2002;31(4):497–512.
  13. 13. Provost S, Saboor A, Cordeiro G, Mansoor M. On the q-generalized extreme value distribution. Revstat-Statist J. 2018;16(1):45–70.
  14. 14. Zaid EOA, Attwa RAE-W, Radwan T. Some measures information for generalized and q-generalized extreme values and its properties. Fractals. 2022;30(10):2240246.
  15. 15. Attwa RAE-W, Radwan T. Applying generalized Type-II hybrid censored samples on generalized and q-generalized extreme value distributions under linear normalization. Symmetry. 2023;15(10):1869.
  16. 16. Nair SS, Jayakumar K. Generalized q -logistic distribution. Commun Statist- Simulat Comput. 2022;53(8):3771–87.
  17. 17. Mathai A, Provost S. The q-extended inverse Gaussian distribution. J Probab Stat Sci. 2011;9:1–20.
  18. 18. Budini AA. Extended q-Gaussian and q-exponential distributions from gamma random variables. Phys Rev E Stat Nonlin Soft Matter Phys. 2015;91(5):052113. pmid:26066125
  19. 19. Afify AZ, Zayed M, Ahsanullah M. The extended exponential distribution and its applications. JSTA. 2018;17(2):213.
  20. 20. Balakrishnan N, Aggarwala R. Progressive censoring: theory, methods, and applications. New York: Springer; 2000.
  21. 21. Yao H, Gui W. Inference on exponentiated Rayleigh distribution with constant stress partially accelerated life tests under progressive type-II censoring. J Appl Stat. 2024:1–29.
  22. 22. Prakash A, Maurya RK, Alsadat N, Obulezi OJ. Parameter estimation for reduced Type-I Heavy-Tailed Weibull distribution under progressive Type-II censoring scheme. Alexandria Eng J. 2024;109:935–49.
  23. 23. Maiti K, Kayal S. Estimation of parameters and reliability characteristics for a generalized Rayleigh distribution under progressive type-II censored sample. Commun Statist - Simulat Comput. 2019;50(11):3669–98.
  24. 24. Sharma A, Sharma A, Pandey JK, Ram M. Swarm intelligence: foundation, principles, and engineering applications. CRC Press; 2022.
  25. 25. Attwa RAE-W, Sadk SW, Aljohani HM. Investigation the generalized extreme value under liner distribution parameters for progressive type-II censoring by using optimization algorithms. MATH. 2024;9(6):15276–302.
  26. 26. Particle swarm optimization with R. [cited 2024 Oct 19. ]. 2022. Available from: https://reintech.io/blog/particle-swarm-optimization-with-r
  27. 27. Attwa RAE-W, Sadk SW, Radwan T. Estimation of Marshall–Olkin extended generalized extreme value distribution parameters under progressive Type-II censoring by using a genetic algorithm. Symmetry. 2024;16(6):669.
  28. 28. Lindley DV. Approximate Bayesian methods. Trabajos de Estadistica Y de Investigacion Operativa. 1980;31(1):223–45.
  29. 29. Zillow. Zillow research data. [cited 2024 Nov 01. ]. 2024. Available from: https://www.zillow.com/research/data/