Figures
Abstract
In this article, the effect of the parameters in the properties of a well-known distribution called q-extended extreme value with linear normalization is discussed. Moreover, these parameters are estimated by both maximum likelihood and Bayesian approaches using type-II progressive censoring. The removals of type-II progressive censoring are considered under three well-known random removals (Fixed, discrete uniform, and binomial). Finding effective numerical techniques is a typical challenge for statisticians when estimating MLE parameters for distributions with many parameters. So one of our aims in this article is to show how the metaheuristic optimization like the particle swarm optimization, can handle this problem. Furthermore, the interval estimation for the parameters is calculated using the Fisher information matrix. The Bayesian approach is utilized for both the informative and non-informative under two different loss functions (square error and Linex loss functions) using Lindley’s approximation. Moreover, home price data in California represent a good fit for the q-extended extreme value distribution with linear normalization. By using this fitting some of California’s future home prices are predicted by using the return level function.
Citation: Abd El-Wahab Attwa R, Sadk SW, Aljohani HM (2025) Estimating distribution parameters under Type II progressive censoring using particle swarm optimization. PLoS One 20(5):
e0323897.
https://doi.org/10.1371/journal.pone.0323897
Editor: Hilary Izuchukwu Okagbue, Covenant University, NIGERIA
Received: February 18, 2025; Accepted: April 15, 2025; Published: May 28, 2025
Copyright: © 2025 Attwa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data used in this paper within the article.
Funding: The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-162).
Competing interests: No authors have competing interests.
1 Introduction
Extreme value theory (EVT) is a very interesting essential branch of statistics for modeling the behavior of extreme events. EVT has many applications, such as insurance, cybersecurity, environmental science, and other fields. [1] provided a good insight into the importance of EVT in these areas. One of the most famous distributions on EVT is the generalized extreme value under linear normalization () distribution. The
distribution is considered a continuous probability distribution that models the maximum (or lowest) of a group of independent, identically distributed random variables. The
distribution has three parameters: location, scale, and shape. These parameters determine tail heaviness and skewness, which are summarized by [2]. Many researchers used
in their investigation as [3–5]. In statistical theory, new standard distributions are currently widely used. Usually, generators are used to add a new parameter and combine preexisting distributions to create a new family see [6–12]. The concept of q-analogs is used in many different kinds of mathematical and statistical contexts, giving a framework for generalizing classical concepts and structures by introducing a new parameter q. The use of q- analogs in probability and statistics enables the investigation of a wider range of distributions and features that can be reduced to their initial distribution when
. Here in this article, we considered the q-generalized extreme value under linear normalization (
) distribution. The
distribution is an extension of the
distribution that incorporates a q parameter, providing greater flexibility for modeling as illustrated by [13]. Many other articles considered the q-analog of their work as [14–19].
In statistical reliability and survival analysis, censoring describes instances in which recognizing the failure of all units may be impossible due to time, cost, or other aspects. Sometimes, it occurs because the observation period expires before all events (such as failures or fatal crashes) have occurred. To handle this dilemma, the researcher used censoring schemes. Censoring schemes are classified into several categories, including Type I, Type II, and type II progressive censoring. In Type I (II) censoring schemes, the experiment terminated at a predetermined time (number of units). While type-II progressively censored samples allow researchers to remove units during the experiment. So in this paper, we compare the behavior of different types of estimators considered under type II progressive censoring for previously determined elimination units in each stage of type II progressive censoring (fixed removal) compared to unknown number of elimination units (Discrete uniform and binomial removal). The [20] provided how to employ type II progressive censoring in your code. Many authors used type II progressive censoring in their contribution as [21–23].
This article explores the estimation of parameters for the distribution employing
and the bayesian estimation
for a type-II progressively censored sample. Since the resulting equations of
are not easy to solve by classical techniques like Newton. We utilize the particle swarm optimization (
) Algorithm to handle the difficulty of
.
is considered One of the most popular algorithms in optimization. which is defined as a computational technique for solving optimization problems. It is inspired by the social acts of birds swarming or fish swarming. It is a population-based stochastic optimization technique that simulations particle movements (possible solutions) in a search space to determine the best solution to a problem. Each particle in the swarm represents a possible solution, which advances through the search space by following the swarm’s current best solution. Particles eventually converge on the optimum answer based on their own and their neighbors’ experiences see, [24] for more details. Their are many advantages of
such as, effectiveness for Multi-Modal Problems, fast Convergence, generally robust, less sensitive to parameter and it does not require gradient information.
The aims of the proposed article can be given as following:
- parameters estimation of
under Type II censoring schemes using both
and
.
- introducing
optimization techniques for MLE.
- Trying to use the fitting to predict some of the future measurements.
The article is organized as follows: In Sect 2, we investigate the properties of the distribution. Sect 3, we outlines the three cases under investigation and investigated
of
parameters based on type-II progressively censored samples for the three cases of removals. Moreover, approximate confidence intervals are utilize. In Sect 4,
for both non-informative and informative priors using two different loss functions. In Sect 5, a simulation is purposed to as an application of the theoretical parts in this paper. Sect 6 contains the real data example. Finally, Sect 7 summarizes the article’s findings.
2
Distribution properties
This section discusses the effect of the parameter q on the behavior of the distribution, the probability density function, PDF , and the cumulative density function , CDF. Moreover, the distribution properties such as the quantile function, return level, reversed hazard rate, hazard rate, moments, skewness and kurtosis and the moment generating function.
2.1
PDF and CDF
Since, PDF and CDF of the according to [13] are given by
and
where,
and
to investigate the impact of the parameters on the shapes of the distribution both CDF and PDF are plotted in Fig 1 using different sets of parameters.
(a) (,
,
), (b) (q = 1.3,
,
), (c) (
,
) and
.
Fig 1 shows,
- The parameter combinations
strongly influence the shapes of both the CDF and PDF.
- The CDF graphs reflect distributions of probability accumulates continuously as x increases. The changes in these curves indicate how probability dependent on parameter values.
- The PDF variations represent deviations in the distribution’s skewness and kurtosis, with some plots showing sharp peaks and heavy tails and others being more spread out.
- The parameter
controls the tail behavior, while q controls the overall height and dispersion of the curves.
2.2 Moments
Both moments (
) and Moment Generating Function (
) are essential concepts in statistics and probability theory that characterize the probability distributions. Moreover, they are important features in theoretical and practical situations since it aids in summarizing and evaluating data distribution. The
indicate many characteristics of a random variable, including as its central tendency, shape, variability, and tail behavior. While the
can be used to derive all moments of a random variable (such as the mean, variance, skewness, and higher-order moments).
Let X be any random variable (RV) follows then the
moments and
are
Then
where the support of x given in Eq 4.
It’s clear from Eqs 5–(6) that these equations cannot be reduced to a closed form. So, the numerical software is used to evaluate these integrations using specified parameter values;
see Table 1.
2.3 Quantile function, skewness, and kurtosis
The quantile function () is a fundamental concept in statistics that explains the connection between probabilities and the corresponding values for a random variable. It is essentially the inverse of CDF and is used to determine the minimum value at which a particular proportion of the data falls. According to [13] the quantile function is
Since the Skewness details the asymmetry or lack of symmetry in the data. While kurtosis
, helps measure the presence of outliers by assessing the heaviness of tails and the sharpness of the peak. The commonly used formals of
and
can be obtained from the distribution’s moments as in the formula 2 in Eqs 7 and (8), specifically the third moment (for skewness) and the fourth moment (for kurtosis), or by depending on the quantile function as given by the formula of Bowley’s Skewness and Moors’ kurtosis as in the formula 1 in Eqs 7 and (8). Where Bowley’s Skewness is a quantile-based skewness measurement that quantifies a distribution’s skew by employing Quartiles. While Moors’ kurtosis is a quantile-based measure of kurtosis, which provides a strong alternative to classic moment-based kurtosis. Using Octiles minimizes sensitivity to extreme outliers and offers a more consistent approximation of a distribution’s tails. Then
and
are given as
Then and
, could be easily obtained as
Fig 2 are purposed for a better understanding of the effect of parameters q and on the behavior of the
and
From Fig 2:
and
change non-linearly with respect to the parameters
and q, where the scaling parameter affecting the distribution properties. Since a larger scale parameter results in a wider spread and contrary a smaller scale parameter results in a narrower distribution.
- The
increaseing sharply for larger values of
, particularly for positive
, indicating fatter tails in the distribution.
- The tail behaviour of
is effected by the values of
, where positive
leads to positive
with longer right tail where as negative
leads to negative
and longer left tail.
2.4 The return level
The return level is an important concept in the study of extreme value theory (EVT), particularly in applications such as hydrology, climate science, finance, and engineering, where we want to evaluate the size of an extreme event that is predicted to occur over a given time period. Let T be the return period, then mathematically, the return level
is given by
. Then
. Then
After some calculation, it is easy to conduct,
Table 1 demonstrate the values of for some parameter values.
From Table 1
- For GEVL, the median and quartiles are higher when
compared to
.
- For
, the median and quartiles are significantly lower when
, indicating a shift in central tendency.
- The variance remains relatively small for all cases, with
compared to GEVL.
- For GEVL, Zt values remain relatively consistent across different values of
. While for
, the return levels are slightly higher compared to GEVL, suggesting heavier tails and greater tail dependence.
- GEVL exhibits negative skewness, indicating a left-skewed distribution, with a kurtosis, suggesting a platykurtic nature. While
exhibits larger absolute skewness, implying a stronger right-skew. Then
distribution modifies the tail behavior of GEVL, making it more sensitive to extreme values and left-skewed distributions
- In the case of the return levels (Zt) under these parameters, there is no noticeable effect between the two distributions.
- The result conducted from Table 1 is compatible with the result of real data in Sect 6 demonstrated in Table 9.
The result in Table 1, is particularly useful for scenarios where modeling extreme lower-end behavior as financial risk and environmental extremes are critical.
2.5 The hazard rate and reversed hazard rate
The hazard rate () and reversed hazard rate (
) are two essential concepts in reliability theory and survival analysis. They describe several features of a mechanism’s or individual’s long-term failure or survival. Since the
and
for
distribution are
and
respectively. By using Eqs 1–(2), it’s easy to get
and
In Fig 3 we display some plots of and
for some parameter values.
(a) (,
,
), (b) (q = 0.9,
,
), (c) (
,
) and
.
From Fig 3, it’s clear that the hazard rate can rise or reduce periodically, depending on the parameter values, reflecting the various risks of systems or occurrences. In most circumstances, the decreases, with small values of
. While
shows increasing for the small value of
.
3
of
under progressive Type-II Censoring with different types removals
In this section, the of
parameters using
is displayed for both point and interval estimation under three cases of type II progressive censoring (Fixed, Discrete uniform, Binomial) random removal. The interval estimation of
parameters is conducted using the Fisher information matrix.
3.1 Types removals of Type II progressive censoring
Let be vector of type II progressive censoring scheme. Then joint PDF of
is
Where, 0<r1<N−n, , for
,
. For more details on how to obtain Eq 9 see, [25].
3.2 MLE of
distribution parameters
Let X be a random variable following distribution. Then the joint likelihood function based on Type II progressive censored data
is given by
. Then
where
where, S is defined at Eq 3 and . By using Eqs (1–(2) in Eq (10), we obtain
then the MLE of parameters of are,
where
Since the Eqs 12–(15) and Eqs 16)–(18) are very difficult and challenging to solve analytically and numerically by the well known traditional methods, the is suggested to handeled this probem. Statisticians and researchers can easily incorporate it into their code by following the steps outlined in ([26]).
For the implementation of follow these steps:
- Define the log-likelihood function,
which called Fitness function.
- Specify the search interval for each parameter.
- Execute the optimization using the (psoptim) function in R.
3.3 Fisher Information Matrix (
)
The Fisher Information Matrix () is a key principle in statistical estimation and information theory that helps in the evaluation of the parameters’ confidence interval.
is equal to the matrix of the negative values of the second partial derivatives of the logarithm of the joint likelihood function given at Eq 10. Then,
where, S is given at Eq 3, and
is the MLE of
. Since the variance-covariance matrix of the parameters is the inverse of
.
Hance the parameter is normally distributied with mean
and variance
. Then the confidence interval of the parameter
is given by
.
4 Bayesian estimation 
In this section we considered the parameters Bayesian estimation () of
for both informative and non-informative cases under the three cases of Type II progressive censoring discussed above using two different loss functions square error loss function (sq) and linex loss function (lx).
Informative
For this case suppose that, all the parameters in S are following exponential distribution with different hyperparameters (b1–b4). While, the prior PDF of P follows beta distribution with hyperparameters c,d. Then the joint prior PDF could be written as
the method of selecting the PDF prior is purposed in [25] and used at [27]. Then the joint posterior PDF is
where, L(q,S) given at Eq 10.
Non-informative
For this case suppose that all the parameters follow a uniform distribution with interval [0,1]. Then the joint prior PDF is equal 1. Then the joint posterior PDF is proportional with L given at Eq 10. Let , then for both cases of the
using LINEX (lx) and square error loss function (sq) is given by
where the expectation in Eq 19 is taking with the respect of the joint posterior PDF. For more details on loss functions given at Eq 19, see [25], which is mentioned in detail. Since this expectation in Eq 19 cannot be reduced analytically, Lindley’s approximation is used; see [28].
5 Simulation
In this section, a Monte Carlo simulation is used to evaluate the productivity of numerous parameter estimators that have been mentioned earlier assuming the real values of parameters are (0.1,0.5,0.5,0.2). To test the sensitivity of claimed estimator techniques to sample size, datasets of 1000 samples are produced using different sample sizes . Further, Lindley’s approximation technique is used to compute
for both sq and lx loss functions for three different values of the parameter
, namely {-1, 1, 0.5} with informative (inf)and non-informative (non) priors. The estimators are thoroughly tested utilizing Bias and mean squared error (MSE) analysis. Additionally, a
confidence interval for the parameters is determined. Tables 2, 3, 4, and 5 display the Bias and MSE of multiple estimators for the parameter
under three distinct scenarios of removals of type II progressive censoring. Moreover, the results of the confidence interval for these parameters are proposed in terms of upper bound(
), lower bound(
) and length of the interval bound (
).
- The
consistently performs well in both Bias and MSE across all sample sizes, illustrating one of the proposed
algorithm’s main advantages.
- The Bayesian estimators change according to the sample size and distribution, demonstrating significant improvement as the sample size grows.
- The lx(0.5) estimator has the largest MSE and Bias, especially for small sample sizes.
- The Lindley approach is especially successful with larger sample sizes, as suggested by [28]. Nevertheless, it works well with samples of 100 or more but is inappropriate for the proposed distribution when sample sizes are less than 50.
- The result of the confidence interval demonstrates improvement with the sample size increase as a decrease of the size of the interval.
6 Real data
Since investigating real estate prices is important due to it’s affecting wealth distribution, economic stability, and personal finances. So, in the present paper, a data set of size 297 representing the home price data in California from 2020 until now is considered. For the data source see [29] and provided at Table 6.
These data set is fitted to both and
and the results obtained into Table 7 and Fig 4 show that both GEVL and
are giving a good fit for this data using different types of fitting measures.
More over, a summary of statistics and statistical properties of this data set are provided in both Tables 8 and 9.
From Tables 8 and 9, and Fig 5:
- Both distributions have positive skewness, indicating a slight right tail (more extreme high values than low values).
- Both distributions have kurtosis values greater than 1. Then both distributions exhibit a slightly heavier tail than a normal distribution (platykurtic).
- The
model predicts higher return levels compared to GEVL, especially for larger return periods (20,30) years which. This is indicated to predict more extreme home price values over time, which is compatible with the results in Table 1.
7 Conclusion
In this paper, the properties of q-extended extreme value distribution with linear normalization are displayed showing that
is particularly useful for scenarios where modeling extreme lower-end behavior as financial risk and environmental extremes are critical. Then the Particle Swarm Optimization (
) for estimating the
of
parameters is used. Both
and
under type-II progressive censoring schemes (fixed, binomial, and discrete uniform random removal distributions) are conducted for
parameters. Both point and interval estimation are considered for parameters
. While Lindley’s approximation was used in Bayesian estimation for both informative and non-informative priors, under the square error and LINEX loss functions. Moreover, simulation is presented showing that
consistently produces outstanding results in both Bias and MSE for different sample sizes, highlighting the strengths of the suggested
algorithm. Contrary the performance of Bayesian estimators fluctuates according to sample size and distribution, with substantial improvements as sample sizes increase. The lx(0.5) estimator provides the highest MSE and Bias, especially if the sample size is small. The Lindley method’s performs well for sample sizes of 100 or more but fails to perform for the suggested distribution
when the sample size is less than 50. Then an example of real home price in California is considered as one of the applications on
showing that
gives to predict more extreme home price values over time, which is compatible with the results in Table 1.
References
- 1. Gilli M, këllezi E. An application of extreme value theory for measuring financial risk. Comput Econ. 2006;27(2–3):207–28.
- 2. Fisher RA, Tippett LHC. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Math Proc Camb Phil Soc. 1928;24(2):180–90.
- 3. Bali TG. The generalized extreme value distribution. Econ Lett. 2003;79(3):423–7.
- 4. Bertin E, Clusel M. Generalized extreme value statistics and sum of correlated variables. J Phys A: Math Gen. 2006;39(24):7607–19.
- 5. Hosking JRM, Wallis JR, Wood EF. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics. 1985;27(3):251–61.
- 6. Bleed SO, Attwa RA-E, Ali RFM, Radwan T. On alpha power transformation generalized pareto distribution and some properties. J Appl Math. 2024;2024(1):6270350.
- 7. Attwa RA-E, Radwan T, Zaid EOA. Bivariate q-extended Weibull morgenstern family and correlation coefficient formulas for some of its sub-models. MATH. 2023;8(11):25325–42.
- 8. Attwa R, Zaid E. Record values from the Gumbel and q-Gumbel distributions with applications. Thailand Stat. 2024;22(4):750–68.
- 9. Dey S, Sharma VK, Mesfioui M. A new extension of weibull distribution with application to lifetime data. Ann Data Sci. 2017;4(1):31–61.
- 10. Mudholkar GS, Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Rel. 1993;42(2):299–302.
- 11. Marshall A, Olkin I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika. 1997;84(3):641–52.
- 12. Eugene N, Lee C, Famoye F. Beta-normal distribution and its applications. Commun Statist Theory Methods. 2002;31(4):497–512.
- 13. Provost S, Saboor A, Cordeiro G, Mansoor M. On the q-generalized extreme value distribution. Revstat-Statist J. 2018;16(1):45–70.
- 14. Zaid EOA, Attwa RAE-W, Radwan T. Some measures information for generalized and q-generalized extreme values and its properties. Fractals. 2022;30(10):2240246.
- 15. Attwa RAE-W, Radwan T. Applying generalized Type-II hybrid censored samples on generalized and q-generalized extreme value distributions under linear normalization. Symmetry. 2023;15(10):1869.
- 16. Nair SS, Jayakumar K. Generalized q -logistic distribution. Commun Statist- Simulat Comput. 2022;53(8):3771–87.
- 17. Mathai A, Provost S. The q-extended inverse Gaussian distribution. J Probab Stat Sci. 2011;9:1–20.
- 18. Budini AA. Extended q-Gaussian and q-exponential distributions from gamma random variables. Phys Rev E Stat Nonlin Soft Matter Phys. 2015;91(5):052113. pmid:26066125
- 19. Afify AZ, Zayed M, Ahsanullah M. The extended exponential distribution and its applications. JSTA. 2018;17(2):213.
- 20.
Balakrishnan N, Aggarwala R. Progressive censoring: theory, methods, and applications. New York: Springer; 2000.
- 21.
Yao H, Gui W. Inference on exponentiated Rayleigh distribution with constant stress partially accelerated life tests under progressive type-II censoring. J Appl Stat. 2024:1–29.
- 22. Prakash A, Maurya RK, Alsadat N, Obulezi OJ. Parameter estimation for reduced Type-I Heavy-Tailed Weibull distribution under progressive Type-II censoring scheme. Alexandria Eng J. 2024;109:935–49.
- 23. Maiti K, Kayal S. Estimation of parameters and reliability characteristics for a generalized Rayleigh distribution under progressive type-II censored sample. Commun Statist - Simulat Comput. 2019;50(11):3669–98.
- 24.
Sharma A, Sharma A, Pandey JK, Ram M. Swarm intelligence: foundation, principles, and engineering applications. CRC Press; 2022.
- 25. Attwa RAE-W, Sadk SW, Aljohani HM. Investigation the generalized extreme value under liner distribution parameters for progressive type-II censoring by using optimization algorithms. MATH. 2024;9(6):15276–302.
- 26.
Particle swarm optimization with R. [cited 2024 Oct 19. ]. 2022. Available from: https://reintech.io/blog/particle-swarm-optimization-with-r
- 27. Attwa RAE-W, Sadk SW, Radwan T. Estimation of Marshall–Olkin extended generalized extreme value distribution parameters under progressive Type-II censoring by using a genetic algorithm. Symmetry. 2024;16(6):669.
- 28. Lindley DV. Approximate Bayesian methods. Trabajos de Estadistica Y de Investigacion Operativa. 1980;31(1):223–45.
- 29.
Zillow. Zillow research data. [cited 2024 Nov 01. ]. 2024. Available from: https://www.zillow.com/research/data/