Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

# Efficient estimation of Pareto model: Some modified percentile estimators

• Sajjad Haider Bhatti ,

Roles Methodology, Software, Supervision, Writing – original draft, Writing – review & editing

Affiliation Department of Statistics, Government College University, Faisalabad, Pakistan

Roles Data curation, Formal analysis, Methodology, Project administration, Validation, Writing – original draft

Affiliation Department of Statistics, Government College University, Faisalabad, Pakistan

Roles Data curation, Formal analysis, Validation, Visualization, Writing – review & editing

Affiliation Department of Statistics, Government College University, Faisalabad, Pakistan

Roles Software, Supervision

Affiliation Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan

Roles Data curation, Writing – original draft

Affiliation Department of Statistics, Government College University, Faisalabad, Pakistan

• Muhammad Ali Raza

Roles Formal analysis, Methodology, Software, Validation

Affiliation Department of Statistics, Government College University, Faisalabad, Pakistan

# Efficient estimation of Pareto model: Some modified percentile estimators

• Sajjad Haider Bhatti,
• Muhammad Ali Raza
x

## Abstract

The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.

## 1. Introduction

Pareto distribution is widely applicable distribution in economics. It was initially introduced by Pareto [1] to represent the income distribution among individuals. It is most appropriate model for situations represented by 80–20 rule, that is, when 80% effect comes from 20% causes. Certainly, a large portion of wealth of society is used or owned by a small percentage of people. The Pareto model has wide application in economic studies as it plays a vital role in the investigation of several phenomena [2]. Although it is most widely used as an income model to define the allocation of wealth among individual units [3] but it is not limited to application only in economics as it has great utility in modeling number of casualties in earthquakes, forestry fire areas and oil & gas in different field sizes [4]. The applicability of Pareto model in real life phenomenon is evident in many studies like, [2,59]. Its generalized, exponentiated, modified, Kumaraswamy and transmuted versions have also been presented with real life applications [1014].

The density function of Pareto distribution is given as where β is scale and α is shape parameter and it is denoted by x ~ Pareto(β, α).

Shapes of Probability Density Function (PDF) and Cumulative Distribution Function (CDF) for different combinations of scale and shape parameters are shown in Figs 1 and 2, respectively.

In the literature concerning parameter estimation, different estimation strategies have been used for the Pareto distribution like Quandt [15] derived expressions for moments, maximum likelihood, percentile and least squares estimators. Kuldroff and Vannman [16] have proposed parameter estimation of the Pareto distribution by linear functions of order statistics. Afify [17] has employed distinct estimation procedures for parameter estimation of Pareto distribution and revealed that least squares estimators perform better in terms of root mean square error. Parameter estimation of the Pareto distribution have also been carried out using jackknife and minimum risk estimators [18]. Based on Monte Carlo simulation Lu and Tao [19] showed that maximum likelihood and weighted least squares methods were equally efficient.

Method of percentile estimation is in use for a long time. Parameters of different probability distributions have been estimated using percentile estimation method and found better or equally efficient to maximum likelihood and least squares techniques [2022].

In the literature devoted to parameter estimation, different modifications have been proposed in standard estimation procedures. Modified maximum likelihood estimators and modified moment estimator have been introduced and found efficient than traditional estimators for different probability distributions like three-parameter log-normal [2324], three-parameter Weibull [25], three-parameter Gamma [26], Rayleigh [27], two-parameter Exponential [28], and two-parameter Power Function [29].

Keeping in view the applicability and importance of the Pareto distribution in empirical studies, method of percentile estimation and superiority of modified estimators for different distributions in recent literature, present study is focused on deriving modified percentile estimators for Pareto distribution. The derived modifications have been compared with traditional percentile estimators through Monte Carlo simulation and two real life datasets.

## 2. Methodology

In the present work, we have suggested some modifications in percentile estimation method using median, geometric mean and expectation of first order statistic of empirical cumulative distribution function of the Pareto distribution. The modified estimators were compared with traditional percentile estimators.

### 2.1 Method of percentile estimation

Percentiles play an important role in descriptive statistics and their use is recommended for parameter estimation as well [30]. The principle is based on equating two values of cumulative distribution function with corresponding percentiles and then simultaneously solving resulting equations for unknown parameters. Following Marks [22], Zaka and Akhtar [29] and Sampath and Anjana [31], we have chosen P25 and P75 to be relatively more accurate in comparison to other pairs of percentiles.

### 2.2 Percentile estimator

Let x1, x2,…,xn be a random sample of size n from Pareto distribution. The cumulative-distribution function of a Pareto distribution with shape and scale parameters α and β, respectively is,

Thus, using percentiles P75 and P25, (1)

Similarly, (2)

Solving Eqs (1) and (2) simultaneously for unknown parameters, we get the percentile estimators for α and β as, (3) (4)

Eqs (3) and (4) are the required percentile estimators of the Pareto distribution. For further reference, we name these estimators as PE.

### 2.3 Modified percentile estimator (I)

Our first modification in method of percentile estimation is based on replacing Eq (2) by median of the Pareto distribution as, (5)

Rewriting Eq (1) as (6)

Solving Eqs (5) and (6) simultaneously we get first modified percentile estimators for α and β, so, (7)

Putting value of from Eq (7) in Eq (6) we get estimate of β as (8)

Eqs (7) and (8) provide expressions for first modified percentile estimators (PE-I, for further reference).

### 2.4 Modified percentile estimator (II)

Our second modified percentile estimator is based on replacing Eq (2) by Geometric Mean (GM) of the Pareto distribution.

(9)

Rewriting Eq (1) as (10)

Solving Eqs (9) and (10) simultaneously we get second modified percentile estimators for α and β as, (11)

Putting value of from Eq (11) in Eq (9) we get estimate of β as (12)

Eqs (11) and (12) are the expressions for the second modified percentile (PE-II for further reference) estimators of the Pareto distribution.

### 2.5 Modified percentile estimator (III)

The third modified percentile estimator proposed is obtained by replacing Eq (2) by expectation of empirical cumulative distribution function of first order statistic of Pareto distribution.

Following [25,26,28,29] expectation of empirical CDF of first order statistic is defined as, (13)

So in case of the Pareto distribution, (14)

We have Eq (1) as (15)

Comparing Eqs (14) and (15), (16)

Eqs (14) and (16) give algebraic expressions for third modified percentile estimators (PE-III for further reference) of parameters of the Pareto distribution.

### 2.6 Performance indices

In order to compare efficiency and accuracy of different estimators, Total Mean Square Error (TMSE) and Total Relative Deviation (TRD) were used as performance indices. These measures are frequently used as performance criterion when different estimators (or estimation strategies) are compared through Monte Carlo simulation [28,29,3239].

These performance indices are defined as, (17) and (18) where α and β are the true parameters, REP is the number of replications while and are the parameter estimates.

As true parameters are unknown in real life data set, total mean square error and total relative deviation cannot be used for assessing performance of estimators in such cases. Therefore, we have used Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE) and Root Mean Square Percentage Error (RMSPE) as performance measures for comparison among different estimators. These measures are defined as, (19) (20) (21) (22) where S(xi) is sample (observed) distribution function and is expected distribution function which are respectively defined as, and with parameter estimates ( and ) form any particular method.

## 3. Monte Carlo simulation

A Monte Carlo simulation study was performed to compare the proposed modified percentile estimators with traditional percentile estimation. This comparison was carried out by taking random samples of different sizes (n = 20, 50, 100, 200, 500 and 1000) with different pairs of parameter values (β, α) = (1, 0.5), (1, 1), (1, 2), (2, 1).

For any combination of true parameters (β and α), Monte Carlo simulation was performed by carrying out following steps in R-language [40].

• A sample of n uniform random numbers was generated in interval [0,1].
• Uniform random numbers were converted in Pareto random variables by following relation.
• The process in above steps was repeated 10000 times.

## 4. Results and discussion

Tables 14 present the results of Monte Carlo simulation study carried out for numerical evaluation of the estimators considered for different sample sizes and different parameter combinations.

Results from Table 1 (for β = 1; α = 0.5) show that modified percentile estimator PE-III (which is based on expectation of empirical cumulative distribution function of first-order statistic) more accurately estimated true parameters compared to traditional percentile estimator and other modified percentile estimators (based on median and geometric mean). From these results, under total mean square error criterion, third modified percentile estimator provided more efficient parameter estimates for all sample sizes as it has lower values of total mean square error values than other competing estimators. Based on second performance criterion, total relative deviation, it is interesting to note that for all samples sizes we come to same conclusion that third modified percentile estimators is more efficient among all estimation strategies considered. It is worth noticing that traditional percentile estimator is second best choice after third modified percentile estimator.

Concerning literature devoted to modified estimators, our results coincide with other studies favouring use of modified maximum likelihood, moment and percentile estimation for different probability distributions [2529].

Avoiding repetition, it can be stated that PE-III provides more efficient and accurate estimates of parameters than other estimators considered for all sample size for parameter combinations (β = 1, α = 1), (β = 1, α = 2) and (β = 2, α = 1) presented in Tables 24, respectively.

Moreover, from results in Tables 14, it can also be observed that modified estimator PE-II (based on geometric mean) is worst performer in terms of both performance indicators. However, its performance gets better gradually with increasing sample size. The reason behind its poor performance in small samples may be that geometric mean is influenced by extreme values which is common in heavy tailed distributions like Pareto.

## 5. Real life examples

In addition to numerical evaluation of proposed estimators through simulation study, the modified percentile estimators were applied on two real life data sets.

For comparison purpose, we have also used maximum likelihood and moment estimators of Pareto distribution. The Maximum Likelihood (ML) estimator of α and β are, (23) (24)

Similarly, the estimators from Method of Moments (MM) are (25) (26)

Example 1: First example is taken from Clark [9], it consists of 21 observations about number of deaths in major earthquakes during 1900–2011 as published by the United States Geological Survey. The results from application of proposed estimators on example 1 are presented in Table 5.

Results from Table 5 clearly indicate the superiority of PE-III in comparison to other percentile based estimators as well as to maximum likelihood and moment estimators. All four performance measures have smaller values for PE-III than other estimators.

Example 2: Second data set is taken from Beirliant et al. [41] consisting of 142 values of fire damage claims (in 1000’s of Norwegian Krones) in Norway during 1975. This data set have also been used by some other studies focusing on Pareto distribution [3,42,43].

Table 6 shows that based on three performance indices, third modified percentile estimator (PE-III) is better than traditional percentile (PE), maximum likelihood (ML), moment (MM) and other modified percentile estimators (PE-I, PE-II). However, maximum likelihood estimation performs slightly better that PE-III in terms of mean absolute error.

## 6. Conclusion

Three modified percentile estimators are proposed for parameter estimation of the Pareto distribution. The modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first order statistic of Pareto distribution. Newly proposed estimators are compared with the traditional percentile estimators via Monte Carlo simulation and performance of modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic is found better than traditional and other modified percentile estimators in terms of mean square error and total relative deviation. The Monte Carlo simulation results were further corroborated by application of proposed estimators on two real-life examples. From real life applications, it is shown that modified percentile estimator based on expectation of empirical cumulative distribution function of first order statistic performs better than not only other percentile based estimators but also maximum likelihood and moment estimators. Considering results from simulation and real data applications, use of modified percentile estimation can be recommended for estimating parameters of the Pareto distribution.

## Supporting information

### S1 Data. MINIMAL DATA.xlsx.

https://doi.org/10.1371/journal.pone.0196456.s001

(XLSX)

## Acknowledgments

We acknowledge with thanks Dr. Shahid Mehmood (University of Oxford, UK) for proofreading and grammatical corrections.

## References

1. 1. Pareto V. The New Theories of Economics. J Polit Econ. 1897;5: 485–502.
2. 2. Arnold BC. Encyclopedia of statistical sciences. John Wiley. 2008.
3. 3. Munir R, Saleem M, Aslam M, Ali S. Comparison of different methods of parameters estimation for Pareto model. Casp J Appl Sci Res. 2013;2: 45–56.
4. 4. Burroughs SM, Tebbens SF. Upper-truncated power law distributions. Fractals. 2001;9: 209–222.
5. 5. Abdel-All NH, Mahmoud MAW, Abd-Ellah HN. Geometrical properties of Pareto distribution. Appl Math Comput. 2003;145: 321–339.
6. 6. Sankaran PG, Nair MT. On finite mixture of Pareto distributions. Calcutta Stat Assoc Bull. 2005;57: 225–226.
7. 7. Newman MEJ. Power laws, Pareto distributions and Zipf’s law. Contemp Phys. 2005;46: 323–351.
8. 8. Aban IB, Meerschaert MM, Panorska AK. Parameter estimation for the truncated Pareto distribution. J Am Stat Assoc. 2006;101: 270–277.
9. 9. Clark DR. A Note on the Upper-truncated Pareto distribution. Casualty Actuarial Society E-Forum, Winter. 2013. pp. 1–22.
10. 10. Castillo E, Hadi AS. Fitting the generalized Pareto distribution to data. J Am Stat Assoc. 1997;92: 1609–1620.
11. 11. Shawky AI, Abu-Zinadah HH. Exponentiated Pareto distribution: different method of estimations. Int J Contemp Mathamatical Sci. 2009;4: 677–693.
12. 12. Muralidharan K, Khabia A. A modified Pareto distribution. J Indian Stat Assoc. 2011;49: 73–90.
13. 13. Bourguignon M, Silva RB, Zea LM, Cordeiro GM. The kumaraswamy Pareto distribution. J Stat Theory Appl. 2013;12: 129–144.
14. 14. Bourguignon M, Ghosh I, Cordeiro GM. General results for the transmuted family of distributions and new models. J Probab Stat. 2016;
15. 15. Quandt RE. Old and new methods of estimation and the Pareto distribution. Metrika. Springer; 1966;10: 55–82.
16. 16. Kulldorff G, Vannman K. Estimation of the location and scale parameters of a Pareto distribution by linear functions of order statistics. J Am Stat Assoc. 1973;68: 218–227.
17. 17. Afify EE. Estimation of parameters for Pareto distribution. Faculty of Eng. Shibeen El Kom Menoufia University. Working Paper; 2003.
18. 18. Kang S-B, Cho Y-S. Estimation of the parameters in a Pareto distribution by jackknife and bootstrap method. J Inf Optim Sci. 1997;18: 289–300.
19. 19. Lu H-L, Tao S-H. The estimation of Pareto distribution by a weighted least square method. Qual Quant. 2007;41: 913–926.
20. 20. Dubey SD. Some percentile estimators for Weibull parameters. Technometrics. 1967;9: 119–129.
21. 21. Wang F, Keats JB. Improved percentile estimation for the two parameter Weibull distribution. Microelectron Reliab. 1995;35: 883–892.
22. 22. Marks NB. Estimation of Weibull parameters from common percentiles. J Appl Stat. Taylor & Francis; 2005;32: 17–24.
23. 23. Cohen AC, Whitten BJ. Estimation in the three-parameter Lognormal distribution. J Am Stat Assoc. 1980;75: 399–404.
24. 24. Iwase K, Kanefuji K. Estimation for 3-parameter Lognormal distribution with unknown shifted origin. Stat Pap. 1994;35: 81–90.
25. 25. Cohen AC, Whitten B. Modified maximum likelihood and modified moment estimators for the three-parameter Weibull distribution. Commun Stat—Theory Methods. 1982;11: 2631–2656.
26. 26. Cohen AC, Whitten BJ. Modified moment and maximum likelihood estimators for parameters of the three-parameter Gamma distribution. Commun Stat Comput. 1982;11: 197–216.
27. 27. Lalitha S, Mishra A. Modified maximum likelihood estimation for Rayleigh distribution. Commun Stat—Theory Methods. 1996;25: 389–401.
28. 28. Rashid MZ, Akhter AS. Estimation accuracy of Exponential distribution parameters. Pakistan J Stat Oper Res. 2011;7: 217–232.
29. 29. Zaka A, Akhter AS. Modified moment, maximum likelihood and percentile estimators for the parameters of the Power Function distribution. Pakistan J Stat Oper Res. 2014;10: 361–368.
30. 30. Schoonjans F, De Bacquer D, Schmid P. Estimation of population percentiles. Epidemiology. LWW; 2011;22: 750–751.
31. 31. Sampath S, Anjana K. Percentile matching estimation of uncertainty distribution. J Uncertain Anal Appl. 2016;4: 6.
32. 32. Al-Fawzan MA. Methods for estimating the prameters of the Weibull distribution. King Abdulaziz City Sci Technol. 2000.
33. 33. Kibria BMG. Performance of some new ridge regression estimators. Commun Stat—Simul Comput. 2003;32: 419–435.
34. 34. Clark AE, Troskie CG. Ridge regression—A simulation study. Commun Stat—Simul Comput. 2006;35: 605–619.
35. 35. Khalaf G, Månsson K, Shukur G. Modified ridge regression estimators. Commun Stat—Theory Methods. 2013;42: 1476–1487.
36. 36. Pobočíková I, Sedliačková Z. Comparison of four methods for estimating the Weibull distribution parameters. Appl Math Sci. 2014;8: 4137–4149.
37. 37. Aslam M. Performance of Kibria’s method for the heteroscedastic ridge regression model: some Monte Carlo evidence. Commun Stat—Simul Comput. 2014;43: 673–686.
38. 38. Aydin D, Şenoğlu B. Monte Carlo comparison of the parameter estimation methods for the two-parameter Gumbel distribution. J Mod Appl Stat Methods. 2015;14: 123–140.
39. 39. Shakeel M, ul Haq MA, Hussain I, Abdulhamid AM, Faisal M. Comparison of two new robust parameter estimation methods for the Power Function distribution. PLoS One. 2016;11: e0160692. pmid:27500404
40. 40. Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria; 2016.
41. 41. Beirliant J, Teugels JL, Vynckier P. Practical Analysis of Extreme Values. Leuven University Press, Leuven, Belgium; 1996.
42. 42. Rizzo ML. New goodness-of-fit tests for Pareto distributions. ASTIN Bull J IAA. 2009;39: 691–715.
43. 43. Obradović M. On asymptotic efficiency of goodness of fit tests for Pareto distribution based on characterizations. Filomat. 2015;29: 2311–2324.