Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A simulation study: Using dual ancillary variable to estimate population mean under stratified random sampling

  • Sohaib Ahmad,

    Roles Conceptualization

    Affiliation Department of Statistics Abdul Wali Khan University Mardan Pakistan

  • Sardar Hussain ,

    Roles Formal analysis

    shussain@stat.qau.edu.pk

    Affiliation Department of Statistics, Quaid-i-azam University, Islamabad, Pakistan

  • Uzma Yasmeen,

    Roles Investigation

    Affiliation Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan

  • Muhammad Aamir,

    Roles Conceptualization, Formal analysis, Methodology, Project administration, Supervision, Validation

    Affiliation Department of Statistics Abdul Wali Khan University Mardan Pakistan

  • Javid Shabbir,

    Roles Project administration

    Affiliation Department of Statistics, Quaid-i-azam University, Islamabad, Pakistan

  • M. El-Morshedy,

    Roles Funding acquisition, Resources, Writing – review & editing

    Affiliations Department of Mathematics, College of Science and Humanities in Al-Kharj, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia, Faculty of Science, Department of Mathematics, Mansoura University, Mansoura, Egypt

  • Afrah Al-Bossly,

    Roles Funding acquisition, Resources, Validation, Writing – review & editing

    Affiliation Department of Mathematics, College of Science and Humanities in Al-Kharj, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia

  • Zubair Ahmad

    Roles Funding acquisition, Validation, Visualization, Writing – review & editing

    Affiliation Department of Statistics, Yazd University, Yazd, Iran

Abstract

In this paper, we propose an improved ratio-in-regression type estimator for the finite population mean under stratified random sampling, by using the ancillary varaible as well as rank of the ancillary varaible. Expressions of the bias and mean square error of the estimators are derived up to the first order of approximation. The present work focused on proper use of the ancillary variable, and it was discussed how ancillary variable can improve the precision of the estimates. Two real data sets as well as simulation study are carried out to observe the performances of the estimators. We demonstrate theoretically and numerically that proposed estimator performs well as compared to all existing estimators.

1. Introduction

In sampling theory, appropriate use of the ancillary information may increase precision of the estimators. Numerous authors employed the ancillary information at the designing stage and at the estimation stage. One purpose of sample survey theory is to estimate unknown population parameters of the variable of intrest being studied, such as mean, median, proportion and variance etc. It is preferable to employ stratified random sampling scheme rather than simple random sampling when data is based on hetrogeneous population.

In stratified random sampling, we divide the diverse population into strata or groups which are non-overlapping, and sampling is carried out from each stratum separetely. Zaman and Kadilar [1] when the difference in variance across strata is significantly greater than the difference in variance within strata, stratification enhances efficiency of the estimates. Zaman and Kadilar [2] and Zaman [3] provided an efficient estimators of population mean using the auxiliary varaible in stratified random sampling. Zaman [4] proposed an efficient exponential type estimator for the population mean under stratified random sampling. Rather and Kadilar [5] introduced dual to ratio cum product type of exponential estimator under stratified random sampling. Mradula et al. [6] obtained an efficient estimation of population mean under stratified random smapling with linear cost function. Javed et al. [7] proposed a simulation based study for progressive estimation of population mean through traditional and non-traditional measures in stratified random sampling. Javed and Irfan [8] obtained a simulation based on new optimal estimators for population mean by using the dual auxiliary information in stratified random sampling. Yadav and Tailor [9] estimated a finite population mean using two auxiliary varaibles under stratified random sampling. Zaman and Bulut [10] proposed a modified regression estimator using robust regression methods and covaraince matrics in stratified sampling scheme. Kumar and Vishwakarma [11] proposed a generalized classes of regression-cum-ratio type estimators for population mean under stratified random sampling. Zahid et al. [12] provided a generalized class of estimators for sensitive varaible in the presence of measurment error and non-response under stratified random sampling. Some important references to the population mean under stratified random sampling using the auxiliary information include Aladag and Cingi [13], Grover and Kaur [14], Shabbir and Gupta [15], Khalid [16,17], Kadilar and Cingi [18,19], Koyuncu and Kadilar [20,21], Singh and Vishwakarma [22]. Recnetly Hussain et al. [23] proposed estimation of finite population distribution function with dual use of the anillary infromation under simple and stratified random sampling.

A unique idea for investigating more optimum estimators involving dual use of the ancillary information to deal with the stratified random sampling method has emerged recently. In this paper, we develop a new efficient estimator for finite population mean using dual ancillary variable under stratified random sampling.

Consider a finite population of distinct and identifiable units, Ѵ = of size Ŋ, are separated into L homogeneous strata of size Ŋh(h = 1, 2,…, L), such that . A simple random sample of size nh is drawn without replacement from the hth stratum such that .

Let y, x and rx be the study, auxiliary and rank of the auxiliary variables respectively, assuming values and for the ith unit in the hth stratum. Let the stratum means be respectively. Let and be the sample means of y, x and rx respectively across the strata, where and are the sample stratum means, are the known stratum weights. Let and are the population means of y, x and rx respectively.

We also define the following error terms.

Let such that E(εi) = 0, for (i = 0,1,2),

Using these notations, we have

Let , and , be the coefficients of variation of y, x and rx and , and be the population correlation coefficients between (yh, xh), (yh, rxh), and (xh, rxh) respectively in the hth stratum. Let , and , are the sample variances in the hth stratum, where , and are the sample standard deviation in the hth stratum. Let , and , are the covariance between their respective subscripts.

The rest of the paper is arranged as follows: In Section 2, literature review of various estimators under stratified random sampling is introduced. In Section 3, we proposed estimator for estimating finite population mean under stratified random sampling using dual ancillary variables are defined. In Section 4, theoretical comparison is conducted to assess the performances of the estimators. In Section 5 and 6, numerical investigation and data description are given. Monte Carlo simulation is concluded in Section 7. Discussions of the numerical results are given in Section 8. Finally, concluding remarks are discussed in Section 9.

2. Literature review

This section studies various stratified estimators that are available in the literature:

  1. The traditional mean estimator in stratified random sampling is:
    The variance of , is given by (1)
  2. Cochran [24], suggested the traditional ratio estimator: (2)
    The bias and MSE of , are given by: and (3)
  3. Murthy [25] suggested the usual product estimator , which is given by: (4)
    The bias and MSE of , are given by: and (5)
  4. Bahl and Tuteja [26] suggested the following estimators: (6) and (7)
    The biases and MSEs of , and , are given by: (8) and (9)
  5. The difference estimator , is given by: (10) where d is an appropriate constant. The minimum variance of at the optimal worth , is given as: (11)
  6. Rao [27], proposed the following estimator: (12) where Q1 and Q2 are constants.
    The bias and MSE of , given by: and
    The optimum values of Q1 and Q2 are given by: and
    The minimum MSE of is given by: (13)
  7. The suggested estimator by Singh et al. [28]: (14)
    For a = 1, and b = 0,
    The bias and MSE of , is given by: and (15)
  8. The suggested estimator of Grover and Kaur [29], is given as: (16)
    Where Z1 and Z2 are unknown constants. For a = 1 and b = 0
    The bias and MSE of are given by: And (17)
    The optimum values of Z1 and Z2 are given as:
    The minimal MSE of , are: (18)
  9. Ahmad et al. [30] proposed an improved estimator , is given by: (19)
    Where Qi(i = 5,6,7) are constants.
    The bias and MSE of , are given by: and (20)
    The optimum values of Q5, Q6 and Q7 are given by:
    The minimum MSE of at optimum values of Q5, Q6, and Q7 are given by: (21)
    where .

3. Proposed estimator

Suitable use of the ancillary information may improve the precision of estimators both at the design and estimation stage. The rank of the ancillary variable is correlated with the study variable when the correlation among the study and ancillary variable is strong. In literature, dual use of ancillary variable has been rarely attempted, therefore we motivated towards it. The principal advantage of our proposed ratio-in-regression type estimator under stratified random sampling is that it is more flexible, efficient than the existing estimators. Taking motivation from Ahmad et al. [30], we propose ratio-in-regression type exponential estimator for estimating the population mean under stratified random sampling.

(22)

Where Q11, Q12 and Q13 are unknown constants.

Solving given in Eq (22), (i)

Simplify Eq (i), we have (23)

The optimum values of Q11, Q12 and Q13 are given by: where

Putting the optimum values of Q11, Q12 and Q13 in (23), we get the minimal mean square error of , given by: (24)

4. Theoretical comparision

In this section, the theoretical comparison of the suggested estimator with existing estimators is considered:

  1. By taking (1) and (24), where
    By taking (3) and (24),
  2. By taking (5) and (24),
    Where
  3. By taking (8) and (24),
  4. By taking (9) and (24),
  5. By taking (11) and (24),
  6. By taking (13) and (24),
  7. By taking (15) and (24),
  8. By taking (18) and (24),
  9. By taking (21) and (24),

5. Numerical investigation

In this portion, we run a numerical test to see how well the existing and suggested estimator performed; two populations are considered for this purpose. To get the PREs, we utilize the below expression:

where i = ().

6. Data discription

To show the efficiancy of proposed estimator over the existing estimators, we conduct a numerical study to investigate the performances of the propose and existing estimator. For this purpose, we used two real data sets of Kadilar and Cingi (2003), summary statistics given in Tables 1 and 2 for the population-I while MSE and PRE are presented in Table 3, similarly summary statistics for population-II given in Tables 4 and 5 while MSE and PRE are presented in Table 6. For population-I study variable is apple production in 1999, and the auxiliary variable is the number of apple trees in 1999. Similarly for population-II the study variable is apple production in 1999, and the auxiliary variable is the number of apple timber in 1998. (Source: Institute of Statistics, Republic of Turkey). We have stratified the data by regions of Turkey such as (1: Marmara, 2: Agean, 3: Miditerranean, 4: Central Anatolia, 5: Black sea, 6: East and Southeast Anatolia) and from each stratum, we have randomly selected the samples whose sizes are computed by using the Neyman allocation method.

thumbnail
Table 2. Correlation and covariance’s using Population I.

https://doi.org/10.1371/journal.pone.0275875.t002

thumbnail
Table 5. Correlation and covariance’s using Population II.

https://doi.org/10.1371/journal.pone.0275875.t005

Population I: (Source: Kadilar and Cingi [18])

Y is the crop of apples in 1999, and X is the number of apples timber in 1999.

Population 2: (Source: Kadilar and Cingi [18])

Y is the crop of apples in 1999, and X is the crop of apples trees in 1998.

7. Simulation study

The efficiency of proposed estimators over competing estimators was demonstrated clearly in the preceding section. A Monte Carlo simulation analysis with R software is also used to assess the efficiency of the proposed estimator using dual ancillary variable under stratified random sampling. The assessment of proposed estimator with existing estimators is illustrated using the percentage relative efficiency (PRE) formula. Yet again, the real population of Kadilar and Cingi [18] is used. The following steps are used in R-Language software to conduct the simulation study.

  1. We considered different sample sizes (64, 80, 96, 128, 144, 160, 176, 192, 208, and 240) through the proportional allocation method.
  2. With stratified sampling, the technique is repeated 100,000 times and the population is divided into six strata to calculate the numerous values of proposed and existing estimators.
  3. The 100,000 values of existing estimators and proposed estimator, as well as their corresponding variances, are computed using the samples obtained.
  4. The values of percentage relative efficiency (PRE) are derived using the values of variances of all existing and proposed estimator and provided in Table 7.
thumbnail
Table 7. Simulation results for the PRE of recommended estimator w.r.t the existing estimators by different sample sizes.

https://doi.org/10.1371/journal.pone.0275875.t007

The consequence of the above results, the performance of the proposed estimator is the best among all the existing estimators under consideration.

Advantages of Monte Carlo simulation

The basis of a Monte Carlo simulation is that the probability of varying outcomes cannot be determined because of random variable interference. Therefore, a Monte Carlo simulation focuses on constantly repeating random samples to achieve certain results.

A Monte Carlo simulation takes variable that has uncertainty and assigns it a random value. The model is then run, and a result is provided. The process is repeated again and again while assigning the variable is question with many different values. Once the simulation is complete, the results are averaged together to provide an estimate.

8. Discussion

To evaluate the advantage of our propose estimator under stratified random sampling, we use two real data sets for numerical comparision. On the basis of numerical results, which are presented in Tables 3 and 6, it is observed that the proposed ratio-in-regression type estimator are more efficient than the usual sample mean estimator, Cochran [24], Murthy [25], Bahl and Tuteja [26], difference estimator, Rao [27], Singh et al. [28], Grover and Kaur [29], Ahmad et al. [30].

Table 7 gives simulation results for the percentage relative efficiency of proposed estimator w.r.t the existing estimators by using different sample sizes i.e: 64, 80, 96, 128, 144, 160, 176, 192, 208, and 240. The value of percentage relative efficiency differs depending on the sample size. From the simulation results, it is also observed that the proposed estimator is more efficient than the existing counterparts, in terms of percentage relative efficiency. As we increase the sample size, the efficiency of our proposed estimator is also increased. Overall, the gain in efficiency of our proposed estimator is the best as compared to all existing counterparts.

For visualization, the comparison of proposed estimator with existing estimators in terms of percentage relative efficiency are presented in Fig 1. The length of a line graph is directly associated with the efficiency of an estimator. More specifically, the higher the length of a line graph, efficient the estimator. In general, we recommend using our proposed estimate for the new survey instead of the existing estimator examined in this paper for estimating the finite population mean under stratified random sampling.

thumbnail
Fig 1. Percentage relative efficiency of different estimators.

https://doi.org/10.1371/journal.pone.0275875.g001

9. Conclusion

In this article, we propose ratio-in-regression type exponential estimator for the finite population mean under stratified random sampling, which required an ancillary variable on the sample mean and rank of the ancillary varaible. Expressions for mean square error of the proposed estimator are derived up to first order of approximation and comparison is made with the estimators mentioned herein. According to results of real data sets, it is perceived that the proposed estimator performs well as compared to its existing counterpart. A simulation analysis is also carried out to assess the robustness and generalizability of the propose estimator. The simulation study’s findings also confirm the utility of the proposed estimator. A numerical study is carried out to support the theoretical results. Therefore we recommend the use of proposed estimators for efficiently estimating the finite population mean under stratified random sampling. The current work can be extended to develop an improved class of estimators under two-phase, non-response, two-stage, and cumulative distribution function sampling scheme using information on ancillary variable for estimating the population mean under simple and stratified random sampling.

References

  1. 1. Zaman T, Kadilar C. Exponential ratio and product type estimators of the mean in stratified two-phase sampling. AIMS Mathematics. 2021 Jan 1;6(5):4265–79.
  2. 2. Zaman T, Kadilar C. On estimating the population mean using auxiliary character in stratified random sampling. Journal of Statistics and Management Systems. 2020 Nov 16;23(8):1415–26.
  3. 3. Zaman T. Efficient estimators of population mean using auxiliary attribute in stratified random sampling. Advances and Applications in Statistics. 2019;56(2):153–71.
  4. 4. Zaman T. An efficient exponential estimator of the mean under stratified random sampling. Mathematical Population Studies. 2021 Apr 3;28(2):104–21.
  5. 5. Rather KU, Kadilar C. Dual to Ratio cum Product Type of Exponential Estimator for Population Mean in Stratified Random Sampling.
  6. 6. Mradula Yadav SK, Varshney R, Dube M. Efficient estimation of population mean under stratified random sampling with linear cost function. Communications in Statistics-Simulation and Computation. 2021 Dec 2; 50(12):4364–87.
  7. 7. Javed M, Irfan M, Bhatti SH, Onyango R. A Simulation-Based Study for Progressive Estimation of Population Mean through Traditional and Nontraditional Measures in Stratified Random Sampling. Journal of Mathematics. 2021 Dec 20; 2021.
  8. 8. Javed M, Irfan M. A simulation study: new optimal estimato rs for population mean by using dual auxiliary information in stratified random sampling. Journal of Taibah University for Science. 2020 Jan 1; 14(1):557–68.
  9. 9. Yadav R, Tailor R. Estimation of finite population mean using two auxiliary variables under stratified random sampling. Statistics in Transition New Series. 2020 Mar 18;21(1).
  10. 10. Zaman T, Bulut H. Modified regression estimators using robust regression methods and covariance matrices in stratified random sampling. Communications in Statistics-Theory and Methods. 2020 Jul 17;49(14):3407–20.
  11. 11. Kumar M, Vishwakarma GK. Generalized Classes of Regression-Cum-Ratio Estimators of Population Mean in Stratified Random Sampling. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences. 2020 Dec;90(5):933–9.
  12. 12. Zahid E, Shabbir J, Alamri OA. A generalized class of estimators for sensitive variable in the presence of measurement error and non-response under stratified random sampling. Journal of King Saud University-Science. 2022 Feb 1;34(2):101741.
  13. 13. Aladag S, Cingi H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Communications in Statistics-Theory and Methods. 2015 Mar 4;44(5):1013–32.
  14. 14. Grover LK, Kaur P. A generalized class of ratio type exponential estimators of population mean under linear transformation of auxiliary variable. Communications in Statistics-Simulation and Computation. 2014 Jan 1;43(7):1552–74.
  15. 15. Shabbir J, Gupta S. Estimation of finite population mean in simple and stratified random sampling using two auxiliary variables. Communications in Statistics-Theory and Methods. 2017 Oct 18;46(20):10135–48.
  16. 16. Singh GN, Khalid M. Exponential chain dual to ratio and regression type estimators of population mean in two-phase sampling. Statistica. 2015 Dec 30;75(4):379–89.
  17. 17. Singh GN, Khalid M. Efficient class of estimators for finite population mean using auxiliary information in two-occasion successive sampling. Journal of Modern Applied Statistical Methods. 2019;17(2):14.
  18. 18. Kadilar C., & Cingi H. (2003). Ratio estimators in stratified random sampling. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 45(2), 218–225.
  19. 19. Kadilar C, Cingi H. A new ratio estimator in stratified random sampling. Communications in Statistics—Theory and Methods. 2005 Mar 1;34(3):597–602.
  20. 20. Koyuncu N, Kadilar C. Ratio and product estimators in stratified random sampling. Journal of statistical planning and inference. 2009 Aug 1;139(8):2552–8.
  21. 21. Koyuncu N, Kadilar C. On improvement in estimating population mean in stratified random sampling. Journal of Applied Statistics. 2010 Jun 1;37(6):999–1013.
  22. 22. Singh HP, Vishwakarma GK. A family of estimators of population mean using auxiliary information in stratified sampling. Communications in Statistics—Theory and Methods. 2008 Feb 13;37(7):1038–50.
  23. 23. Hussain S, Ahmad S, Saleem M, Akhtar S. Finite population distribution function estimation with dual use of auxiliary information under simple and stratified random sampling. Plos one. 2020 Sep 28;15(9):e0239098. pmid:32986764
  24. 24. Cochran WG. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. The journal of agricultural science. 1940 Apr;30(2):262–75.
  25. 25. Murthy MN. Product method of estimation. Sankhyā: The Indian Journal of Statistics, Series A. 1964 Jul 1:69–74.
  26. 26. Bahl S, Tuteja R. Ratio and product type exponential estimators. Journal of information and optimization sciences. 1991 Jan 1;12(1):159–64.
  27. 27. Rao TJ. On certail methods of improving ration and regression estimators. Communications in Statistics-Theory and Methods. 1991 Jan 1;20(10):3325–40.
  28. 28. Singh R, Chauhan P, Sawan N, Smarandache F. Improvement in estimating the population mean using exponential estimator in simple random sampling. Auxiliary Information and a priori Values in Construction of Improved Estimators. 2007;33.
  29. 29. Grover LK, Kaur P. Ratio type exponential estimators of population mean under linear transformation of auxiliary variable: theory and methods. South African Statistical Journal. 2011 Jan 1;45(2):205–30.
  30. 30. Ahmad S, Hussain S, Aamir M, Yasmeen U, Shabbir J, Ahmad Z. Dual use of auxiliary information for estimating the finite population mean under the stratified random sampling scheme. Journal of Mathematics. 2021 Nov 16;2021.