Figures
Abstract
Since Shannon’s formulation of the entropy theory in 1940 and Jaynes’ discovery of the principle of maximum entropy (POME) in 1950, entropy applications have proliferated across a wide range of different research areas including hydrological and environmental sciences. In addition to POME, the method of probability-weighted moments (PWM), was introduced and recommended as an alternative to classical moments. The PWM is thought to be less impacted by sampling variability and be more efficient at obtaining robust parameter estimates. To enhance the PWM, self-determined probability-weighted moments was introduced by (Haktanir 1997). In this article, we estimate the parameters of Kumaraswamy distribution using the previously mentioned methods. These methods are compared to two older methods, the maximum likelihood and the conventional method of moments techniques using Monte Carlo simulations. A numerical example based on real data is presented to illustrate the implementation of the proposed procedures.
Citation: Helu A (2022) The principle of maximum entropy and the probability-weighted moments for estimating the parameters of the Kumaraswamy distribution. PLoS ONE 17(5): e0268602. https://doi.org/10.1371/journal.pone.0268602
Editor: Jayajit Das, The Research Institute at the Nationwide Children’s Hospital and the Ohio State University, UNITED STATES
Received: December 13, 2021; Accepted: May 3, 2022; Published: May 31, 2022
Copyright: © 2022 Amal Helu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: NO authors have competing interests.
Introduction
A wide variety of methods have been developed to perform parameter estimation; see e.g. [1] for a discussion of some (frequently limited) sets of data. Common methods include, but are not limited to, the maximum likelihood method and the method of moments. The former is the most important method since it leads to efficient parameter estimators that are asymptotically normal. However, the maximum likelihood (MLE) method is not easy to obtain, highly computational and too sensitive to extreme values especially for small samples. Although MLE method is satisfactory for large samples, however, the final estimate is not always a global maximum because it can depend on the starting value. Moreover, the MLE method does not frequently lend itself to close form or easily manipulated algebraic expressions, even though it is known to provide asymptotically minimum variance, but not necessarily unbiased estimators.
The method of moment (MOM) is widely applied due to its relative ease of application. Moreover, the MOM can help to obtain a starting value for the numerical procedures involved in MLE estimation.
Other estimating methods have been developed as alternatives. Among these, the probability-weighted moments (PWM) which is a generalization of the classical moments of probability distribution. [2, 3] showed that the PWM can remove the ambiguity from the MLE. This method which was initiated by [4] constituted a leading alternative to the moment and MLE methods for fitting statistical distributions for which the inverse F(x) is analytically possible in a closed form; that is, if X is a random variable and F is the value of the cumulative distribution function of X, the value of X may be written as a function of F: X = X(F), which can be explicitly defined. [4, 5] showed that the PWM can derive simple expressions for the parameters for most of the distributions, including several for which parameter estimates are not readily obtained by using the method of MLE or conventional moments. In addition, the PWM method is thought to be less affected by sampling variability and is more efficient at producing robust parameter estimates from small samples [6].
In order to enhance the accuracy of the PWM [7] introduced the self-determined probability-weighted moment (SD-PWM) method. The SD-PWM method has the capability to more accurately account for the variation in the data sample, as well as the ability to consider any outliers that may be presented their. Moreover, SD-PWM method can reveal if a particular distribution is suitable for portraying the behavior of the sample data [7–9].
In case of scarce data, the principle of maximum entropy (POME) is used to generate least biased probability distribution that is appropriate for a wide range of applications such as hydrology frequency analysis, where a large amount of data is not available. [10] used entropy to quantitatively describe the uncertainty or information content of a random event X.
[11] developed the principle of maximum entropy (POME) as a tool for choosing some specific probability distribution from the set of feasible solutions. The chosen distribution maximizes the entropy function subjected to satisfying information constrains via the method of Lagrange multipliers. Hence, this distribution is consistent with the given information, but retains maximum uncertainty within the feasible domain and thus ensures the least biased [12]. Therefore, the parameters of the distribution can be obtained by achieving the maximum of the entropy function. [13] showed that for a given information such as mean, variance, skewness, etc., the distribution derived by POME would best represent X; implicitly, this distribution would best represent the sample from which the information was derived. Inversely, if it is desired to fit a particular probability distribution to a sample data, then POME can uniquely specify the constraints (or the information) needed to derive that distribution. The distribution parameters are then related to these constraints. An excellent discussion of the underlying mathematical rationale is given in [14, 15].
The Kumaraswamy distribution which is also known as a generalized beta distribution of the first kind is an extremely important and complex distribution used for risk analysis and reliability mechanisms with great success in biomedical and epidemiological research. The Kumaraswamy distribution is one of these distributions that is particularly beneficial for many natural phenomena whose outcomes have lower and upper bounds or bounded outcomes, such as individuals’ heights, test scores, atmospheric temperatures, economic and hydrological data.
This distribution was developed by [16]. [16, 17] have shown that Kumaraswamy’s distribution can be used to approximate many different distributions depending on the parameters α and β, for example, uniform, triangular, triangular, and many others. The Kumaraswamy distribution has received considerable attention in the literature among others [18–23]. The probability density function (pdf) and the cumulative distribution function (cdf) of a two-parameter Kumaraswamy random variable X is written as
(1)
(2)
respectively, where α > 0 and β > 0 are the shape parameters. For simplicity we will use Kum(α, β) to represent the two-parameter Kumaraswamy probability density function. The plots of the pdf and the cdf of the Kum(α, β) distribution for selected parameter values are represented in Figs (1)–(6).
The objective of this paper is to develop a new competitive methods of parameter estimation for the Kumaraswamy distribution and to identify the most efficient estimator. These methods include the probability weighted moment, the self-determined probability weighted-moment and the principle of maximum entropy.
The performance of each of these methods is studied and compared with their counter traditional methods, such as the maximum likelihood estimation and the method of moment estimators using extensive simulations. The estimates were compared in terms of their resulting biases, mean squared error, coverage probability and length of their confidence intervals. The construction of the confidence intervals of the unknown parameters is obtained using bootstrap method (see [24]). In addition, the bootstrap method is also used to calculate the bootstrap bias, standard error, lower and upper confidence interval limits for the real life data sets.
Methods of estimation
In this section, we discuss the principle of maximum entropy estimators (POME) in details, followed by the method of moments estimation (MOM) and its variants such as the probability weighted moments and the self-determined probability weighted moments and finally the maximum likelihood estimation method, each followed by the procedure adopted for the estimation.
Principle of maximum entropy (POME)
Among the parameter estimation methods, entropy, which is a measure of uncertainty of random variables, has attracted much attention and has been used for a variety of applications in hydrology [25].
The Shannon entropy function H(f) for a continuous random variable X can be expressed as:
(3)
To estimate the parameters α and β of Eq (1), the POME method needs to maximize H(f) by establishing a relationship between the constraints and the Lagrange multipliers. This can be achieved by, specifying the suitable constraints, deriving the entropy function of the Kum(α, β) distribution, and finally, concluding the relationship between the Lagrange multipliers and these constraints. A complete mathematical discussion of this method can be found in [13], Levine and [14], Sigh and [15, 26, 27].
Specification of constraints.
Since the maximization of the entropy relays on the constraints to be satisfied by Kum(α, β) function, the first step in the application of the POME is to determine its constraints. We do so, by taking the natural logarithm of Eq (1) which is written as
(4)
Multiplying Eq (4) by [−f(x)] and integrating from 0 to 1, we obtain the entropy function:
(5)
To maximize H(f) in Eq (5), the following constraints should be satisfied.
(6)
(7)
(8)
where, E(*) denotes the expectation of a bracketed quantity, C1, C2 and C3, are unique constraints (see [15]). To derive the POME method for the estimation of the parameters of Kum(α, β), Eqs (6)–(8) become the constraints. A complete mathematical discussion of the rational for deriving the constraints in this manner can be found in [13, 14, 15].
Construction of the zeroth Lagrange multiplier.
The least biased pdf, f(x), consistent with Eqs (6)–(8), and corresponding to the POME takes the following form
(9)
where λ0, λ1 and λ2 are the Lagrange multipliers. The λs represent the information content of each constraint i.e. if λi = 0 this means that the ith constraint is redundant and has no informational value. Hence, It doesn’t reduce the level of uncertainty. The mathematical rational for Eq (9) has been presented in [14].
By applying Eq (9) to the total probability condition in Eq (3), one obtains:
(10)
Taking the logarithm of Eq (10) results in the zeroth Lagrange λ0 multiplier as a function of Lagrange multiplier λ1 and λ2, with expression given as:
(11)
The inverse of Eq (10) is:
(12)
Derivation of entropy function.
Introduction of Eq (12) in Eq (9) produces
(13)
By contrasting Eq (13) with Eq (1) we recognize that
(14)
Using Eq (13), the entropy function, given by Eq (3), can be written as
(15)
Relation between the distribution parameters and the constraints.
According to [15], the relation between the distribution parameters and the constraints are obtained by taking partial derivatives of Eq (15) with respect to the Lagrange multipliers as well as the distribution parameters, and then equating (each of) these derivative to zero, and making use of the constraints. To that end, taking partial derivatives of Eq (15) with respect to λ1, λ2 and β separately and equating each derivative to zero yields:
(16)
(17)
(18)
where
is the digamma function. Introduction of Eq (14) into Eqs (16)–(18) and recalling Eqs (16)–(18) yields respectively:
(19)
(20)
(21)
Clearly, Eq (21) doesn’t hold. Therefore, the parameter estimation equations for POME consist of Eqs 19 and 20. The expectations of Eqs 19 and 20 are replaced by their sample estimates, and the simplifications of 19 and 20 leads to
(22)
(23)
Method of moments(MOM)
The method of moments, estimating the parameters of the probability distribution by matching the sample moments
(24)
(25)
with the theoretical moments using Eqs 26 and 27 to obtain the MOM estimates (
and
(26)
(27)
Probability weighted moments(PWM)
The probability weighted moments of a random variable X with cumulative distribution function F(x) = P(X ≤ x) and quantile X = X(F) are formally defined as:
(28)
where Mp,r,s is the probability weighted moment of order (p, r, s), E is the expectation operator and p, r, and s are real numbers. If r = s = 0 and p is a nonnegative integer, then Mp,0,0 represents the conventional moment about the origin of order p. Two useful sets of Mp,r,s are defined as:
(29)
and
(30)
The two PWM sets αs&βr, are linear combinations of each other. [4] favored αs, [6] favored βr. Therefore, any one of them can be used, whichever is possible. In practice, one chooses the moments for which Eq (2) can be most easily solved analytically.
For nonnegative integers s & r, [5] introduced unbiased estimators of αs and βr, which are based on the ordered sample x(1), x(2), …, x(n) from the distribution F. Which are defined as:
(31)
and
(32)
where
(33)
and
(34)
are estimates of the exceedance (1 − F(x)) and the non-exceedance F(x) probabilities, respectively. These estimates are not based on the assumed distribution, but are based solely on the position of x(i) within the ordered sample.
For the Kum(α, β) distribution, we prefer to work with the PWM of the form αs which is given by Eq (29). The inverse function of Eq (1) is given by
(35)
The PWMs (αs) for Kum(α, β) is given by:
(36)
where Csc(*) is the cosecant function of the bracketed quantity. Since the Kumaraswamy distribution is a two-parameter distribution, then only the first two PWM; α0 and α1 are needed. They are given as follows:
(37)
(38)
To obtain the PWM estimates
and
the population PWMs: α0;α1 in Eqs (37) & (38) are replaced by their sample estimators
using (31), the resulting equations are:
(39)
(40)
Note that based on the ordered sample x(1), x(2), …, x(n) of size n from the distribution F,
in Eq (31) is an unbiased estimator of αs [5, 6].
The value of the Kumaraswamy parameters can be determined by substituting the estimators given by Eqs (39)&(40) in the Eqs (37)&(38).
Self-determined probability weighted-moments(SD-PWM)
In order to accurately estimate the distribution parameters, we presumed that the sample observations follow the Kum(α, β) distribution, thus conduct some relevant behavior of the distribution. An inspection of Eq (31) shows that the exceedance probability is not assigned to xi according to the assumed distribution, but rather based solely on the position of xi within the ordered sample. As mentioned before in the introduction section, the method SD-PWM was developed as an effort to improve estimation performance of the PWM method by using the given distribution. The method of SD-PWM assumes that the exceedance probability of the observations can be assigned via the cumulative distribution function of the assumed distribution, therefore, the SD-PWM sample estimators for αs is defined as:
(41)
(42)
Since the Kumaraswamy distribution is a two-parameter distribution, then only the first two SD-PWM samples are needed, hence,
(43)
(44)
The value of the Kumaraswamy parameters using the method of SD-PWM can be obtained by replacing α0 and α1 in Eqs (37) and (38) by
and
in Eqs (43) and (44).
Maximum likelihood method
Let x1, x2, …, xn be a random sample from Kum(α, β), using Eq (1) the log-likelihood function is given by
(45)
The MLEs of the parameters α and β, denoted by
and
respectively, can be obtained by taking the first derivative of (45) with respect to α and β and then equating the normal equations to 0 as follows:
(46)
(47)
Note that there is no explicit solutions to Eqs (46) and (47). Hence, numerical methods like Newton–Raphson method can be used to obtain MLEs of α and β.
Real life data
This section considers two real-life data sets, namely maximum flood level and relief time for 50 arthritic patients, to demonstrate the proposed method and verify its effectiveness. The validity of the Kumaraswamy model was checked using Kolmogorov-Smirnov (K − S), Anderson-Darling (A − D), and chi-square tests.
Example 1
The data for this application were obtained in a civil engineering context. It represents the maximum flood level (in millions of cubic feet per second) for the Susquehanna River at Harrisburg, Pennsylvania. The numbers in this data represent the maximum flood level for four years, the first number being 0.654 for the period 1890–1893, and the last one being 0.265, which is for the time period 1966–1969. The data were utilized by [28] and it is given in Table (1).
It is observed that K-S = 0.21091 with pvalue = 0.29281, A-D = 0.93218 and chi-square distance = 2.4424 with a corresponding pvalue = 0.29488. This indicates that the Kumaraswamy model provides a good fit to the above data. Fig (7) gives the histogram of the data-set and the plots of the fitted density. The QQ plot in Fig (8) suggests that Kumaraswamy is very suitable for this data. In contrast, some hydrologist believe flooding can be assessed by unbounded distributions (see [29]).
Parametric bootstrap percentile method is used to compute the parametric estimates, the bootstrap estimates (BootEst) and their corresponding standard error (StdErr). A 95% confidence interval is calculated and reported in terms of (LCL, UCL). The output of the bootstrap analysis along with the parameter estimates are summarized in Table (2).
Example 2
Our second data set were taken from a clinical trial aimed at testing the efficacy of an analgesic. In Table (3), relief times (in hours) are shown for 50 arthritic patients treated with a fixed dosage of this medication. These data were first utilized by [30] and later by [31].
The legitimacy of the Kumaraswamy model was checked. It is observed that K-S = 0.08578 with pvalue = 0.8249, A-D = 0.39141 and chi-square distance = 2.3616 with a corresponding pvalue = 0.66958. This indicates that the Kumaraswamy model provides a good fit to the above data. Fig (9) gives the histogram of the data-set and the plot of the fitted density. The QQ plot in Fig (10) suggests that Kumaraswamy is very suitable for the precipitation data.
Parametric bootstrap percentile method is used to compute the parameter estimates, the bootstrap estimates (BootEst) and their corresponding standard error (StdErr). A 95% confidence interval is calculated and reported in terms of (LCL, UCL). The output of the bootstrap analysis along with the parameter estimates are summarized in Table (4). Tables (2) and (4) reveal that PWM-based estimates regularly outperform all other estimating strategies. Their Bias and standard error are the least, and their confidence interval is the shortest.
Simulation study
In this section, a simulation study is conducted to compare the performance of the different estimation procedures discussed in the previous sections. The performance of the different estimators is compared in terms of Bias and mean squared error (MSE). Suppose is the estimate of θ (= α, β) for the i-th simulated data set, then the absolute Bias (Bias) and the MSE are computed as follows:
&
.
We perform the simulation study using SAS/IML. We replicate the process 1000 times. In each replication, a random sample of size n (= 20, 30,50, 80,100, 200, 300, 500, 1000) is drawn from the Kumaraswamy distribution. The true parameter values used in the data generating process are (α, β) = (2,2), (2,3), (2,4), (2,5), (2,10), (3,2), (3,3), (3,4), (3,5), (3,10), (4,2), (4,3), (4,4), (4,5), (4,10), (5,2), (5,3), (5,4), (5,5), (5,10). In each case, we compute 95% symmetric percentile bootstrap confidence interval based on 500 bootstrap samples. We repeat the process 1000 times and obtain the average lengths (L) of the confidence intervals and their coverage percentages (CP). All results are reported in Tables 5–10.
- It is immediate from Tables 5–7 that as the sample size increases, all estimators demonstrate the property of consistency, meaning their MSE values approach zero.
- The study also depicts that the estimates based on PWM outperforms all other estimates in terms of Bias and MSE values, demonstrating that this technique is useful in estimating the shape parameters of the Kumaraswamy distribution.
- It should also be noted that the estimates based on POME are equivalent to the estimates based on MLE in terms of Bias and MSE values, and this is true for all values of n, α and β.
- For all estimating approaches, the Bias of both
and
decreases as n increases, as expected.
- In terms of Bias and MSE values, Tables 5–7 clearly illustrate that POME-based estimates and MlE-based estimates are nearly identical.
- Overall, the simulation results suggest that with a large sample size n, the differences in Bias and MSE between MLE, MOM, SD-PWM, and POME methods of estimation become minimal.
Tables 5–7 show the Bias and MSE values for each estimating method. It is, therefore, critical to understand how each estimating method handles interval estimation. As a consequence, at 95% confidence levels, we generated parametric Bootstrap confidence intervals and assessed the coverage probability and average length for these intervals. Tables 8–10 present a summary of our findings. From these tables, the following conclusions may be inferred.
- The interval average length is narrower and the coverage rate is higher as sample size n increases.
- The average length and coverage probability of the MLE are not satisfactory, whereas the PWM outperforms all other methods by providing the shortest average length and the highest coverage probability that is close to the nominal value.
- In terms of average length and coverage probability, all estimating methods for the shape parameter β are comparable. However, this is not the case, when estimating α.
Conclusions and remarks
In this paper, we have considered the MLE, MOM, PWM, SD-PWM and the POME to derive estimates for the shape parameters of the Kumaraswamy distribution. We conducted an extensive simulation analysis to compare these approaches with various sample sizes and unknown parameter combinations. The Bias, MSE and Bootstrap confidence interval length and coverage probability have been obtained. The simulation findings, as well as the results from the two real data sets, demonstrated that the PWM unequivocally outperforms all other estimating methods. Furthermore, POME and MLE are identical in their parameter estimates.
References
- 1.
Rao A. R., & Hamed K. H. (2000). The Logistic Distribution. Flood Frequency Analysis. CRC Press. Boca Raton, Florida, USA, 291–321.
- 2. Papalexiou S. M., & Koutsoyiannis D. (2012). Entropy based derivation of probability distributions: A case study to daily rainfall. Advances in Water Resources, 45, 51–57.
- 3.
Hradil Z., & Rehácek, J. (2006). Likelihood and entropy for statistical inversion. In Journal of Physics: Conference Series (Vol. 36, No. 1, p. 55). IOP Publishing.
- 4. Greenwood J. A., Landwehr J. M., Matalas N. C., & Wallis J. R. (1979). Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Water resources research; 15 (5): 1049–1054.
- 5. Landwehr J. M., Matalas N. C., & Wallis J. R. (1979). Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles. Water Resources Research; 15 (5): 1055–1064.
- 6.
Hosking J. R. (1986). The theory of probability weighted moments. IBM Research Division, TJ Watson Research Center.
- 7. Haktanir T. (1997). Self-determined probability-weighted moments method and its application to various distributions. Journal of Hydrology, 194(1-4), 180–200.
- 8. Whalen T. M., Savage G. T., & Jeong G. D. (2002). The method of self-determined probability weighted moments revisited. Journal of Hydrology, 268(1-4), 177–191.
- 9. Whalen T. M., Savage G. T., & Jeong G. D. (2004). An evaluation of the self-determined probability-weighted moment method for estimating extreme wind speeds. Journal of Wind Engineering and Industrial Aerodynamics, 92(3-4), 219–239.
- 10. Shannon C. E. (1948). A mathematical theory of communication, bell Syst. Tech. J., 27: 376–423; 623–656. Discrepancy and integration of continuous functions. J. of Approx. Theory, 52, 121-131.
- 11.
Jaynes E. T. (1961). 118, 171 (1961). Phys. Rev., 118, 171.
- 12. Guo L., & Garland M. (2006). The use of entropy minimization for the solution of blind source separation problems in image analysis. Pattern Recognition; 39 (6): 1066–1073.
- 13. Jaynes E. T. (1968). Prior probabilities. IEEE Transactions on systems science and cybernetics; 4 (3): 227–241.
- 14.
Levine R. D., & Tribus M. (1979). Maximum entropy formalism. In Maximum Entropy Formalism Conference (1978: Massachusetts Institute of Technology). Mit Press.
- 15. Singh V. P., & Rajagopal A. K. (1986). A new method of parameter estimation for hydrologic frequency analysis. Hydrological Science and Technology; 2 (3): 33–40.
- 16. Kumaraswamy P. (1980). A generalized probability density function for double-bounded random processes. Journal of hydrology, 46(1-2), 79–88.
- 17. Ponnambalam K., Seifi A., & Vlach J. (2001). Probabilistic design of systems with general distributions of parameters. International journal of circuit theory and applications; 29 (6): 527–536.
- 18. Lemonte A. J. (2011). Improved point estimation for the Kumaraswamy distribution. Journal of Statistical Computation and Simulation; 81 (12): 1971–1982.
- 19. Dey S., Mazucheli J., & Nadarajah S. (2018). Kumaraswamy distribution: different methods of estimation. Computational and Applied Mathematics; 37 (2): 2094–2111.
- 20. Mitnik P. A. (2013). New properties of the Kumaraswamy distribution. Communications in Statistics-Theory and Methods; 42 (5): 741–755.
- 21. Garg M. (2009). On Generalized Order Statistics From Kumaraswamy Distribution. Tamsui Oxford Journal of Mathematical Sciences (TOJMS), 25(2).
- 22. Nadar M., Papadopoulos A., & Kızılaslan F. (2013). Statistical analysis for Kumaraswamy’s distribution based on record data. Statistical Papers; 54 (2): 355–369.
- 23. Gholizadeh R., Khalilpor M., & Hadian M. (2011). Bayesian estimations in the Kumaraswamy distribution under progressively type II censoring data. International Journal of Engineering, Science and Technology; 3 (9): 47–65.
- 24.
Efron B. and Tibshirani R. J. (1994). An introduction to the bootstrap. CRC press.
- 25.
Singh V. (1998). Entropy-based parameter estimation in hydrology (Vol. 30). Springer Science & Business Media.
- 26. Singh V. P., & Deng Z. Q. (2003). Entropy-based parameter estimation for kappa distribution. Journal of Hydrologic Engineering; 8 (2): 81–92.
- 27. Song S., Song X., & Kang Y. (2017). Entropy-Based Parameter Estimation for the Four-Parameter Exponential Gamma Distribution. Entropy, 19(5), 189.
- 28. Dumonceaux Robert, and Charles E. Antle . Discrimination between the log-normal and the Weibull distributions. Technometrics 15, no. 4 (1973): 923–926.
- 29. Zaghloul M., Papalexiou S. M., Elshorbagy A., & Coulibaly P. (2020). Revisiting flood peak distributions: A pan-Canadian investigation. Advances in Water Resources, 145, 103720.
- 30. Wingo D. R. (1983). Maximum likelihood methods for fitting the Burr type XII distribution to life test data. Biometrical journal; 25 (1): 77–84.
- 31. Soliman A. A., Abd Ellah A. H., Abou-Elheggag N. A., & Modhesh A. A. (2012). Estimation of the coefficient of variation for non-normal model using progressive first-failure-censoring data. Journal of Applied Statistics; 39 (12): 2741–2758.