Figures
Abstract
In this article, a new robust ratio type estimator using the Uk’s redescending M-estimator is proposed for the estimation of the finite population mean in the simple random sampling (SRS) when there are outliers in the dataset. The mean square error (MSE) equation of the proposed estimator is obtained using the first order of approximation and it has been compared with the traditional ratio-type estimators in the literature, robust regression estimators, and other existing redescending M-estimators. A real-life data and simulation study are used to justify the efficiency of the proposed estimators. It has been shown that the proposed estimator is more efficient than other estimators in the literature on both simulation and real data studies.
Citation: Rather KUI, Koçyiğit EG, Onyango R, Kadilar C (2022) Improved regression in ratio type estimators based on robust M-estimation. PLoS ONE 17(12): e0278868. https://doi.org/10.1371/journal.pone.0278868
Editor: Nadia Hashim Al-Noor, Mustansiriyah University - College of Science, IRAQ
Received: October 18, 2022; Accepted: November 26, 2022; Published: December 12, 2022
Copyright: © 2022 Rather et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Outliers are observations that behave differently from the majority in datasets and often can significantly affect statistics. In sampling studies, however, the presence of outliers cannot be easily detected, since the entire population cannot always be reached. Especially in methods that need to work with small sample sizes, the efficiency of the estimation decreases if an outlier observation is taken into the sample.
To reduce the consequences of an outlier(s) in the real data, robust regression methods are generally used. M-estimators are used as a robust replacement for the general classical estimators utilized in the field of statistics. To overcome the problem of outliers efficiently as compared to other robust estimation methods, the Uk’s redescending M-estimator is proposed [1]. The outliers in the data mainly affect the traditional estimation methods and reduce their efficiencies. In fact, the performance of the ordinary least square (OLS) estimators reduces in the presence of outliers. Therefore, numerous redescending M-estimation methods have been developed to control the consequences of outliers and to improve the efficiency of the procedures, including [2–30]. This study aims to reduce the effect of outliers by developing a new ratio-type redescending M-estimator based on the Uk’s redescending M-estimator (URME) that may improve the efficiency of URME and provide a perfect estimation.
This article is organized as follows: Section 2 introduces the traditional ratio estimators based on previous estimators and some existing M-estimators in the literature. In Section 3, we give brief information about Uk’s redescending M-Estimator and then present the proposed estimator. In addition, efficiency comparisons of the proposed estimator are given in the last part of this section. Section 4 calculates the relative efficiencies of the estimators and compares these estimators with each other in theory and in the application by simulation and real data, respectively. Lastly, Section 5 concludes and offers for the future studies.
Existing estimators in the literature
Kadilar and Cingi (2004) estimators
In the simple random sampling, Kadilar and Cingi [31] introduced ratio estimators by adapting the traditional estimators and other ratio-type estimators in literature [32]. On the basis of MSE equations and numerical illustrations, it was proved that the efficiencies of the proposed estimators are better than OLS estimators. These estimators are
(1)
where b is the slope coefficient derived from the OLS estimation,
is the observed sample mean of the study variable and
is the sample mean of the auxiliary variable. Also, β1 = 1 and γ1 = 0, β2 = 1 and γ2 = Cx, β3 = 1 and γ3 = β2(x), β4 = β2(x) and γ4 = Cx, β5 = Cx and γ5 = β2(x). Here, β2(x) and Cx are both the population coefficient of kurtosis and coefficient of variation of the auxiliary variable, respectively. It should be noted that, when we do not have the population parameters, we can estimate these parameters from the sample. The MSE equation of
is as follows:
(2)
where
and B is obtained by an expected value of b such that E(b) = B.
It is worth to note that the ratio estimator, given in Eq (1), has higher potentiality and proficiency in the existence of outliers than that of other traditional estimators in the literature [33,34]. However, the occurrence of outliers vanishes the productivity and proficiency of these estimators. Therefore, Kadilar et al. [35] proposed new ratio estimators for the efficient estimation of the population mean.
Kadilar et al. (2007) [35] Huber M-estimators
For the regression analysis, numerous methods have been introduced in the literature to deal with the problem of outliers in the data. Such estimators were initially developed by Huber [9], but later on, Kadilar et al. [35] gave emphasis on these estimators by using the robust regression as a substitute for OLS. The estimators were named as Huber M-estimators (HM) and they were given as
(4)
where bHM is the slope coefficient of the robust regression M-estimators given by Huber [9]. The design of Huber’s function ρ(rj) is given by
(5)
where r is the random error following the OLS method while c is the tuning constant.
The advised value of c from the Huber [9] is one and half times of the estimated standard deviation of error. The MSE equation of the M-estimators is given as follows:
(6)
where BHM is the expected slope coefficient of b. The MSE for the estimators, given in Eq (6), can also be obtained by replacing B in Eq ((2) with BHM. The MSE computed for M-estimators are relatively more efficient as compared to the OLS estimators.
Raza et al. (2019) [36] estimators
Raza et al. [36] proposed ratio estimators based on the newly developed robust redescending M-estimator. The redescending M-estimators (RM) are given by
(7)
where bRM is the slope coefficient of the redescending M-estimators given by Raza et al. [36]. The design of the Raza’s objective function ρ(rj) is described as
(8)
where c and v are tuning constants. For the current study, optimum values of the tuning constant are c = 2.5 and v = 8. The bRM redescending M-estimator is used in the MSE equation of the ratio estimators in Eq (7) as follows:
(9)
Noor-ul-Amin et al. (2020) [37] estimators
Noor-ul-Amin et al. [37] proposed another ratio estimator using the robust M-estimators and named it as redescending M-estimators under the different objective function given by
(10)
where bNM is the slope coefficient of the redescending M-estimators given by Noor-ul-Amin et al. [37]. The design of the Noor objective function ρ2(rj) is described as
(11)
The MSE equation of the ratio estimators, given in Eq (10), is calculated with the same method as that given before.
Proposed ratio estimators based on Uk’s redescending M-estimator
Uk’s redescending M-estimator
The proposed estimator is also known as Uk’s redescending M-estimator. The M-estimator of β is defined by the following objective function:
(13)
where ri = yi−βxi represents the residuals. An objective function must fullfill the following standard properties:
- ρ(0) = 0
- ρ(ri)≥0
- ρ(ri) = ρ(−ri)
- ρ(ri)≥ρ(rj) for |ri|≥|rj|
- ρ is differentiable
M-estimator is called a redescending M-estimator if it fullfils the standard related properties and the derivative of its objective function is ψ-function. Differentiating Eq (13) with respect to we obtain ψ(ri) function as follows:
(14)
Dividing ψ(ri) by ri we obtain the weight function as
(15)
On the base of procedure, defined in Eqs (5), (8) and (11), a redescending M-estimator is proposed with the aid of [1]. The objective function of the proposed estimator is
(16)
Differentiating Eq (16) w.r.t we get the ψ-function as
(17)
Dividing ψ(ri) by residual, we obtain the weight function as
(18)
The graphs of the objective ρ-function, ψ-function, and weight function are shown in Fig 1A–1C, respectively.
Graphs of the functions of Uk’s M-estimator (A) Objective function, (B) Ψ-function and (C) weight function.
Proposed estimator
Motivated from the estimators [31,35,36,37] in literature and by using the Uk’s Redescending M-Estimator [1], the proposed estimator is defined as follows:
(19)
The MSE equation of the estimator in the Eq (19) is obtained by
(20)
where BUK is calculated from the objective function in the Eq (16) and
,
,
, and
.
To evaluate the efficiency of the proposed ratio estimator, MSE equations of the estimators will be compared in Section 3.3.
Efficiency comparisons
For the theoretical comparisons of the proposed estimator with other estimators, it is first necessary to compare it with the traditional estimator proposed by Kadilar and Cingi [31].
where b is LS slope obtained by the OLS method.
From Eq (21), it is possible to compare the estimators to a general formula with B* which can be B, BHM, BRM, and BNM as follows:
- B*>BUK
- B*+BUK>2(b−Rj)
If the given Conditions (I) and (II) are satisfied, the proposed estimator is the most efficient estimator.
Numerical comparisons
Real data studies
To prove the efficiency of the proposed estimators, real-life data examples are considered. For this strategy, we use two different datasets. The first real dataset is the apple production data taken from the Black Sea Region in Turkey [35]. Apple production in tons is taken as a study variable and the number of trees (1 unit = 1000 trees) in 204 villages is taken as an auxiliary variable. Table 1 shows the population parameters for the first real dataset. Fig 2 shows the scatter plot of the data where outliers can be seen clearly.
For the comparison, the reference estimator is traditional ratio estimator. Percent relative efficiency is computed by using
(22)
where * = HM, RM, NM and UK. 10000 sample size of n = 30 were drawn from the population which is size N = 204 and the PREs were calculated using Eq (22) and the values obtained are given in Table 2. The best predictors are marked with "*" in the table.
The second real dataset concerning the U.S. State Public-School Expenditures is used. This data consists of fifty-one observations indicating the per-capita income in dollars and per-capita education expenditure in dollars for the U. S. states in 1970 [38]. The per-capita income is taken as the study variable and per-capita education expenditure is taken as an auxiliary variable. The original data was free from outliers. For this reason, a 7% outlier was added as in Raza [36]. The scatter plots of the original and outlier-added data are given within the Fig 3. The parameters of each population are given in Table 3. All of the calculations have been made as in the first real dataset and the obtained PRE values are given in Table 4. The best estimators are marked with “*”. As shown in Table 4, we see that the proposed estimators are quite efficient estimators according to other estimators, especially for the outlier-added data.’
A comparison of the proposed estimators with each other for all real datasets used is summarized in Fig 4. Accordingly, it can be inferred that among the proposed estimators, is the most effective one in general.
In all of the various real datasets used, the proposed estimator is found to be the most efficient estimator. Theoretically, for Condition (I), it can be seen from Tables 1 and 3 that the BUK value is lower than the other B values. The information given in Tables 1 and 3 also shows that Condition (II) of Eq (21) is satisfied in Table 5.
Simulation study
The simulation study is also conducted to check the superiority of the proposed estimator. For this purpose, data is generated from the normal distribution for representing symmetric distributions and exponential distribution for skewed distributions by using the R software. Results are calculated from the 10000 SRS (without replacement) samples. Efficiency is compared for 20, 30, 40, and 50 sample sizes of n. Also, we consider the outlier rates as 0.05 and 0.1. The following regression model is used to generate data for the simulation study:
where ei refers to residuals and α = 2, b = 1.
To verify the efficiency of the proposed estimator, 95% of the study variable is generated using N(20,10), and 5% of the variable is generated using N(50,10) for outlier data. Similarly, for the skewed distribution, 95% of the study variable is generated using Exp(3), and 5% of the variable is generated using Exp(15) for outlier data. Residuals are generated using the same ratio of N(0,1) with N(30,1), and Exp(1) with Exp(5) respectively. The tuning constants were taken as 1.5 for Huber, and 3 for NM and UK as suggested. Note that this simulation study is repeated for 10% outlier data as well. The calculated B coefficients are given in Table 6 for both distribution. PRE values were calculated using Eq (22) and the results are given in Table 7. The best predictors are marked as before. Note that the PRE values of the proposed estimators are also presented in Fig 5 for both distribution.
Conclusion
In the simple random sampling, under the determined conditions, the ratio estimators are employed to estimate the population mean efficiently. On the other hand, M-estimators are developed in the case that the data contain outliers. It has been seen from the studies in the literature that more effective results are obtained as a result of combining the ratio estimators and the M-estimators in the presence of outliers. Our results require additional precision; however, the outliers violate the OLS assumptions and do not produce good results. We present a Uk’s redescending M-estimator-based ratio estimator to solve this problem. To support the proposed estimators, real-life data examples and a simulation study are conducted and they prove the efficiency of the proposed estimator.
In real data studies, it is noteworthy that the proposed estimators are more effective than the others. It was observed that the efficiency of robust estimators increased as the number of outliers increased in the data. The most striking point observed in real data studies is on the original public school expenditures real dataset. The efficiency of the estimators on this real dataset without outliers is even lower than the reference that is a non-robust ratio estimator. In contrast, the proposed estimator is still the most efficient estimator.
The simulation results are also obtained in a way that supports the real data study. As the number of outliers increases, the efficiency of robust estimators increases and the most effective one is the proposed estimators again. It was observed that the efficiency in the skewed distribution was lower than in the symmetrical distribution. In both real data and simulation studies, it is an advantage in terms of the usability of the proposed estimator that the necessary conditions are provided for the estimator to be effective. Therefore, the most efficient estimator in all numerical studies is the proposed estimator. When the estimators were compared among themselves, it was seen that was superior to the others. However, this estimator includes more population parameters of the auxiliary variable. If only the mean of the auxiliary variable is known, the
estimator can be used as a more effective alternative than other estimators in the literature.
For future study, examining the proposed estimator, under other sampling methods, such as systematic, stratified, or ranked set sampling, can be considered as in the SRS method. Alternatively, different ratio estimators based on Uk’s redescending M-estimator can also be suggested.
Acknowledgments
The Authors wish to thank the anonymous referee for the careful reading and constructive suggestions which led to improvement over an earlier version of the paper.
References
- 1. Khalil U, Ali A, Khan DM, Khan SA, Qadir F. Efficient Uk’s Redescending M-Estimator for Robust Regression. Pakistan Journal of Statistics. 2016;32(2): 125–138.
- 2. Beaton AE, Tukey JW. The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data. Technometrics. 1974;16(2): 147–185.
- 3. Andrews DF. A Robust Method forMultiple Linear Regression. Technometrics. 1974;16(4): 523–531.
- 4.
Andrews DF, Hampel FR. Robust Estimates of Location: Festschrift Wilks (Samuel S): Survey and Advances. Princeton University Press; 2015.
- 5. Qadir MF. Robust Method for Detection of Single and Multiple Outliers. Sci. Khyber. 1996;9(2): 135–144.
- 6. Ali A, Qadir MF. A Modified M-Estimator for the Detection of Outliers.Pakistan J. Stat. Oper. Res. 2005; 1(1): 49–64.
- 7. Ali A, Khan SA, Khan DM, Khalil U. A New Efficient Redescending M- Estimator: Alamgir Redescending M- Estimator. Res. J. Recent Sci. 2013; 2(8): 79–91.
- 8.
Hampel FR, Ronchetti EM, Rousseeuw P, Stahel WA. Robust Statistics: the Approach based on Influence Functions. Wiley-Interscience, New York; 1986.
- 9. Huber PJ. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964;35(1): 73–101.
- 10.
Huber PJ, Ronchetti EM. Robust Statistics. John Wiley & Sons, New York; 1981.
- 11. Ullah I, Qadir MF, Ali A. Insha’s Redescending M-estimator for Robust Regression: A Comparative Study. Pakistan J. Stat. Oper. Res. 2006; 2(2): 135–144.
- 12. Noor-ul-amin M, Asghar SUD, Sanaullah A, Shehzad MA. Redescending M-Estimator for Robust Regression. Journal of Reliability and Statistical Studies. 2018;11(2): 69–80.
- 13. Bachmaier M. Consistency of completely outlier-adjusted simultaneous redescending M-estimators of location and scale. AStA Advances in Statistical Analysis. (2007);91(2): 197–219.
- 14. Benseradj H, Guessoum Z. Strong uniform consistency rate of an M-estimator of regression function for incomplete data under α-mixing condition. Communications in Statistics-Theory and Methods. (2022;51(7):2082–2115.
- 15. Ebele AS, Iheanyi OS. The Performance of Redescending M-Estimators when Outliers are in Two Dimensional Space. Earthline Journal of Mathematical Sciences. 2022;8(2): 295–304.
- 16. Gu J, Fan Y, Yin G. Reconstructing the Kaplan–Meier Estimator as an M-estimator. The American Statistician. 2022;76(1): 37–43.
- 17. Subzar M, Bouza-Herrera CN. Ratio estimator under rank set sampling scheme using huber m in case of outliers. Investigación Operacional. 2021; 42(4): 469–477.
- 18. Ullah I, Qadir MF, Ali A. Insha’s redescending M-estimator for robust regression: a comparative study. Pakistan journal of statistics and operation research. 2006; 135–144.
- 19. Bhushan S, Kumar A. Novel log type class of estimators under ranked set sampling. Sankhya B. 2022;84:421–447.
- 20. Bhushan S, Kumar A, Lone SA. On some novel classes of estimators under ranked set sampling. AEJ-Alexandria Engineering Journal. 2022;61:5465–5474.
- 21. Bhushan S, Kumar A, Singh S. Some efficient classes of estimators under stratified sampling. Communications in Statistics- Theory and Methods. 2021: 1–30.
- 22. Bhushan S, Kumar A, Akhtar MT, Lone SA. Logarithmic type predictive estimators under simple random sampling. AIMS Mathematics. 2022;7(7): 11992–12010.
- 23. Bhushan S, Kumar A, Onyango R, Singh S. Some improved classes of estimators in stratified sampling using bivariate auxiliary information. Journal of Probability and Statistics. 2022;2: 1–23.
- 24. Bhushan S, Kumar A, Pandey AP, Singh S. Estimation of population mean in presence of missing data under simple random sampling. Communications in Statistics- Simulation and computation. 2021:1–22.
- 25. Bhushan S, Kumar A, Shahab S, Lone SA, Almutlak SA. Modified class of estimators using ranked set sampling. Mathematics. 2022;10(3921): 1–13.
- 26. Bhushan S, Kumar A, Shahab S, Lone SA, Akhtar MT. On efficient estimation of population mean under stratified ranked set sampling. Journal of Mathematics. 2022;3: 1–20.
- 27. Bhushan S, Kumar A, Singh S, Kumar S. An improved class of estimators of population mean under simple random sampling. Philippine Statistician. 2021;70(1): 33–47.
- 28. Shahzad U, Al-Noor NH, Hanif M, Sajjad I, Muhammad Anas M. Imputation based mean estimators in case of missing data utilizing robust regression and variance–covariance matrices. Communications in Statistics-Simulation and Computation. 2020;51(8):4276–4295.
- 29. Shahzad U, Hanif M, Sajjad I, Anas MM. Quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information. Scientia Iranica. 2020;29(3): 1705–1715.
- 30. Yasin S, Salem S, Ayed H, Kamal S, Suhail M, Khan YA. Modified Robust Ridge M-Estimators in Two-Parameter Ridge Regression Model. Mathematical Problems in Engineering, 2021.
- 31. Kadilar C, Cingi H. Ratio Estimators in Simple Random Sampling. Appl. Math. Comput. 2004;151(3): 893–902.
- 32. Ray SK, Singh RK. Difference-cum-ratio type estimators. J. Ind. Stat. Assoc. 1981; 19: 147–151.
- 33. Sisodia BVS, Dwivedi VK. Modified Ratio Estimator Using Coefficient of Variation of Auxiliary Variable. Journal-Indian Society of Agricultural Statistics. 1981;33: 13–18.
- 34. Upadhayaya LN, Singh HP. Use of Transformed Auxiliary Variable in Estimating the Finite Population Mean. Biometrical Journal. 1999;41(5): 627–636.
- 35. Kadilar C, Candan M, Cingi H. Ratio Estimators Using Robust Regression. Hacettepe Journal of Mathematics and Statistics. 2007;36(2): 181–188.
- 36. Raza A, Noor-ul-amin M, Hanif M. Regression-in-Ratio Estimators in The Redescending M-Estimator. Journal of Reliability and Statistical Studies. 2019;12(2): 1–10.
- 37. Noor-ul-amin M, Asghar S, Sanaullan A. Ratio Estimators in the Presence of Outliers Using Redescending M-Estimator. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences. 2020; 1–6.
- 38.
Fox J. Applied Regression Analysis and Generalized Linear Models. Third Edition. Sage, Los Angeles; 2016.