Efficient estimation of Pareto model: Some modified percentile estimators

The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.


Introduction
Pareto distribution is widely applicable distribution in economics. It was initially introduced by Pareto [1] to represent the income distribution among individuals. It is most appropriate model for situations represented by 80-20 rule, that is, when 80% effect comes from 20% causes. Certainly, a large portion of wealth of society is used or owned by a small percentage of people. The Pareto model has wide application in economic studies as it plays a vital role in the investigation of several phenomena [2]. Although it is most widely used as an income model to define the allocation of wealth among individual units [3] but it is not limited to application only in economics as it has great utility in modeling number of casualties in earthquakes, forestry fire areas and oil & gas in different field sizes [4]. The applicability of Pareto model in real life phenomenon is evident in many studies like, [2,[5][6][7][8][9]. Its generalized, exponentiated, modified, Kumaraswamy and transmuted versions have also been presented with real life applications [10][11][12][13][14].
The density function of Pareto distribution is given as where β is scale and α is shape parameter and it is denoted by x~Pareto(β, α). PLOS  Shapes of Probability Density Function (PDF) and Cumulative Distribution Function (CDF) for different combinations of scale and shape parameters are shown in Figs 1 and 2, respectively.
In the literature concerning parameter estimation, different estimation strategies have been used for the Pareto distribution like Quandt [15] derived expressions for moments, maximum likelihood, percentile and least squares estimators. Kuldroff and Vannman [16] have proposed parameter estimation of the Pareto distribution by linear functions of order statistics. Afify [17] has employed distinct estimation procedures for parameter estimation of Pareto distribution and revealed that least squares estimators perform better in terms of root mean square error. Parameter estimation of the Pareto distribution have also been carried out using jackknife and minimum risk estimators [18]. Based on Monte Carlo simulation Lu and Tao [19] showed that maximum likelihood and weighted least squares methods were equally efficient. Some modified percentile estimators for Pareto model Method of percentile estimation is in use for a long time. Parameters of different probability distributions have been estimated using percentile estimation method and found better or equally efficient to maximum likelihood and least squares techniques [20][21][22].
In the literature devoted to parameter estimation, different modifications have been proposed in standard estimation procedures. Modified maximum likelihood estimators and modified moment estimator have been introduced and found efficient than traditional estimators for different probability distributions like three-parameter log-normal [23][24], three-parameter Weibull [25], three-parameter Gamma [26], Rayleigh [27], two-parameter Exponential [28], and two-parameter Power Function [29].
Keeping in view the applicability and importance of the Pareto distribution in empirical studies, method of percentile estimation and superiority of modified estimators for different distributions in recent literature, present study is focused on deriving modified percentile estimators for Pareto distribution. The derived modifications have been compared with traditional percentile estimators through Monte Carlo simulation and two real life datasets.

Methodology
In the present work, we have suggested some modifications in percentile estimation method using median, geometric mean and expectation of first order statistic of empirical cumulative distribution function of the Pareto distribution. The modified estimators were compared with traditional percentile estimators.

Method of percentile estimation
Percentiles play an important role in descriptive statistics and their use is recommended for parameter estimation as well [30]. The principle is based on equating two values of cumulative distribution function with corresponding percentiles and then simultaneously solving resulting equations for unknown parameters. Following Marks [22], Zaka and Akhtar [29] and Sampath and Anjana [31], we have chosen P 25 and P 75 to be relatively more accurate in comparison to other pairs of percentiles.

Percentile estimator
Let x 1 , x 2 ,. . .,x n be a random sample of size n from Pareto distribution. The cumulative-distribution function of a Pareto distribution with shape and scale parameters α and β, respectively is, Thus, using percentiles P 75 and P 25 , Similarly, Solving Eqs (1) and (2) simultaneously for unknown parameters, we get the percentile estimators for α and β as,â Eqs (3) and (4) are the required percentile estimators of the Pareto distribution. For further reference, we name these estimators as PE.

Modified percentile estimator (I)
Our first modification in method of percentile estimation is based on replacing Eq (2) by median of the Pareto distribution as,X ¼ b2 Rewriting Eq (1) as Solving Eqs (5) and (6) simultaneously we get first modified percentile estimators for α and β,X Putting value ofâ from Eq (7) in Eq (6)  Eqs (7) and (8) provide expressions for first modified percentile estimators (PE-I, for further reference).

Modified percentile estimator (II)
Our second modified percentile estimator is based on replacing Eq (2) by Geometric Mean (GM) of the Pareto distribution.
Rewriting Eq (1) as Solving Eqs (9) and (10)  Putting value ofâ from Eq (11) in Eq (9) we get estimate of β aŝ Eqs (11) and (12) are the expressions for the second modified percentile (PE-II for further reference) estimators of the Pareto distribution.

Modified percentile estimator (III)
The third modified percentile estimator proposed is obtained by replacing Eq (2) by expectation of empirical cumulative distribution function of first order statistic of Pareto distribution.
Following [25,26,28,29] expectation of empirical CDF of first order statistic is defined as, So in case of the Pareto distribution, We have Eq (1) as Comparing Eqs (14) and (15), Eqs (14) and (16) give algebraic expressions for third modified percentile estimators (PE-III for further reference) of parameters of the Pareto distribution.
These performance indices are defined as, and where α and β are the true parameters, REP is the number of replications whileâ andb are the parameter estimates.
As true parameters are unknown in real life data set, total mean square error and total relative deviation cannot be used for assessing performance of estimators in such cases. Therefore, we have used Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE) and Root Mean Square Percentage Error (RMSPE) as performance measures for comparison among different estimators. These measures are defined as, where S(x i ) is sample (observed) distribution function andFðx i Þ is expected distribution function which are respectively defined as, with parameter estimates (â andb) form any particular method. For any combination of true parameters (β and α), Monte Carlo simulation was performed by carrying out following steps in R-language [40].

Monte Carlo simulation
• A sample of n uniform random numbers was generated in interval [0,1].
i:e U $ Unif ½0; 1 • Uniform random numbers were converted in Pareto random variables by following relation.
• The process in above steps was repeated 10000 times.

Results and discussion
Tables 1-4 present the results of Monte Carlo simulation study carried out for numerical evaluation of the estimators considered for different sample sizes and different parameter combinations. Results from Table 1 (for β = 1; α = 0.5) show that modified percentile estimator PE-III (which is based on expectation of empirical cumulative distribution function of first-order statistic) more accurately estimated true parameters compared to traditional percentile estimator and other modified percentile estimators (based on median and geometric mean). From these results, under total mean square error criterion, third modified percentile estimator provided more efficient parameter estimates for all sample sizes as it has lower values of total mean square error values than other competing estimators. Based on second performance criterion, total relative deviation, it is interesting to note that for all samples sizes we come to same conclusion that third modified percentile estimators is more efficient among all estimation strategies considered. It is worth noticing that traditional percentile estimator is second best choice after third modified percentile estimator.
Concerning literature devoted to modified estimators, our results coincide with other studies favouring use of modified maximum likelihood, moment and percentile estimation for different probability distributions [25][26][27][28][29].
Avoiding repetition, it can be stated that PE-III provides more efficient and accurate estimates of parameters than other estimators considered for all sample size for parameter combinations (β = 1, α = 1), (β = 1, α = 2) and (β = 2, α = 1) presented in Tables 2-4, respectively. Moreover, from results in Tables 1-4, it can also be observed that modified estimator PE-II (based on geometric mean) is worst performer in terms of both performance indicators. However, its performance gets better gradually with increasing sample size. The reason behind its poor performance in small samples may be that geometric mean is influenced by extreme values which is common in heavy tailed distributions like Pareto.

Real life examples
In addition to numerical evaluation of proposed estimators through simulation study, the modified percentile estimators were applied on two real life data sets. For comparison purpose, we have also used maximum likelihood and moment estimators of Pareto distribution. The Maximum Likelihood (ML) estimator of α and β are, Similarly, the estimators from Method of Moments (MM) arê Example 1: First example is taken from Clark [9], it consists of 21 observations about number of deaths in major earthquakes during 1900-2011 as published by the United States Geological Survey. The results from application of proposed estimators on example 1 are presented in Table 5.
Results from Table 5 clearly indicate the superiority of PE-III in comparison to other percentile based estimators as well as to maximum likelihood and moment estimators. All four performance measures have smaller values for PE-III than other estimators.
Example 2: Second data set is taken from Beirliant et al. [41] consisting of 142 values of fire damage claims (in 1000's of Norwegian Krones) in Norway during 1975. This data set have also been used by some other studies focusing on Pareto distribution [3,42,43]. Table 6 shows that based on three performance indices, third modified percentile estimator (PE-III) is better than traditional percentile (PE), maximum likelihood (ML), moment (MM) and other modified percentile estimators (PE-I, PE-II). However, maximum likelihood estimation performs slightly better that PE-III in terms of mean absolute error.

Conclusion
Three modified percentile estimators are proposed for parameter estimation of the Pareto distribution. The modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first order statistic of Pareto distribution. Newly proposed estimators are compared with the traditional percentile estimators via Monte Carlo  simulation and performance of modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic is found better than traditional and other modified percentile estimators in terms of mean square error and total relative deviation. The Monte Carlo simulation results were further corroborated by application of proposed estimators on two real-life examples. From real life applications, it is shown that modified percentile estimator based on expectation of empirical cumulative distribution function of first order statistic performs better than not only other percentile based estimators but also maximum likelihood and moment estimators. Considering results from simulation and real data applications, use of modified percentile estimation can be recommended for estimating parameters of the Pareto distribution.