Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

# Multivariate Multi-Objective Allocation in Stratified Random Sampling: A Game Theoretic Approach

• Ijaz Hussain,
• Alaa Mohamd Shoukry
x

## Abstract

We consider the problem of multivariate multi-objective allocation where no or limited information is available within the stratum variance. Results show that a game theoretic approach (based on weighted goal programming) can be applied to sample size allocation problems. We use simulation technique to determine payoff matrix and to solve a minimax game.

## Introduction

A choice of sampling plan is fundamental to any statistical study because it provides estimates of population parameters. Sample size allocation to each stratum is necessary in stratified random sampling design. An optimum allocation can be applied to each characteristic unless sufficient information about stratum variability is available. However, the optimization technique can lead to misleading results because of limited information about cost and variance. The precision (1/variance) and cost may achieve in the process of implementation. For more discussion on it see [13], [4, 5] and [6].

In this study, we propose a multivariate game theoretic approach for the sample size allocation problem in stratified random sampling design. There are many techniques which are being used for allocation of sample size, such as proportional allocation, optimal allocation etc. All these techniques are suitable when we have sampling frame and other relevant information regarding population variance etc. These techniques are helpful and might be more relevant if we have sufficient information about population. In case when we have i) no or limited information about population ii) one cannot be much optimistic about the sample results that they will be true on average iii) there may be a high variability among sampling units iv) one wants to deal with adverse case scenario regarding variation in sample This technique will help him to answer the above question.

In such games proportional allocation technique is computationally feasible and generally applied ([7]). Within stratum Variance is vital for a game theoretic approach. In univariate, [8] formulated a mini-max allocation problem which is a function of specified minimum upper bounds for each stratum variances.

The [9] presented “a game theoretic formulation for the multivariate case, where the covariances between pairs of responses are supposed to be constant from stratum to stratum”. Moreover, these strategies are functions of stratum variance and covariances. The [10] discussed an optimum allocation for a multivariate design that “minimizes the cost of obtaining estimates with smaller errors than previously specified numbers with confidence level”. He also showed that variance information could be useful to obtain nearly optimum allocation.

The [11] obtained posterior variances by using the priori information of both, mean and variance. [12] and [13] proposed a “method of allocation in multivariate surveys where various stratum variances are assumed to be known”. It minimizes the cost of having estimates of variances smaller than its predecessor.

In game theory literature, many authors discussed various models of two players game. A traveler’s dilemma game (TDG) model on two coupled latices presented by [14] which investigate effects of coupling on cooperation. A simulation study of this model indicates that cooperation behavior varies over lattices. A two player game between cooperator and defector was discussed in [15] in which they simulated utility coupling on weighted lattice. An other two player game on a square lattices using different weights for available strategies modeled in [16]. A risk aversion model presented in [17] when player’s participation is probabilistic. The [18] modeled a two player game which considers the reputation and behavior diversity which varies over strategy space. Simulation results show that cooperation behavior influenced by reputation index.

Allocation in multivariate surveys must be optimum for all characteristics. For example, any such allocation which minimizes the cost vectors or the variance functions, which minimizes it or maximizes the relative efficiency comparing with other allocation. A detailed discussion given in [19] [2022], [2326], [2730], [31] and [32]

Second section explains the sampling notations. We set a multi-objective game allocation problem in section 3. Section 4 explains methodology of our approach and discussion on results is given in Section 5.

## Sampling notations

### Population

Let we have population of size N which is further divided into L mutually exclusive strata, where . Consider a data set Yjhi for j = 1, 2, …, Q characteristics and h = 1, 2, …, L strata with i = 1, 2, 3, … Nh sampling units in the hth stratum. is the population mean of hth stratum of jth characteristic.

If is weight of hth stratum and is the population variance of jth study variable which can calculated from hth stratum as;

### 0.1 Sample

We draw a simple random sample of size nh independently from each stratum such that . Let is stratified estimator of population mean of characteristic j, which is given as: where . The variance of is:

Ignoring the term independent of nh, we have; (1)

If our interest lies in squared coefficient of variation instead of variance, we can use the following expression; where, . Substituting the value of from Eq 1 in the above equation, we have; (2)

## Game setting in a multi-objective allocation problem

We draw a simple random sample from all strata such that while assuming a finite population. The objective is to minimize some vector relation of coefficients of variation (CV) while allocating a sample in all strata. For a single characteristic, say j, the simple mean estimator of CV can be expressed as Eq (2).

In particular, an optimum allocation of a sample of size n is a choice of the nh that minimizes Eq (2) subject to the restriction that if values and are known. An optimum allocation only be computed if and are known ([33]). We can use unbiased estimators as; and of and , respectively. Let say zjh is CV2 that can be computed from sample as; (3)

### Players: Sampler (player 1) and Adversary (player 2)

If we consider sampler as player 1, the zjh from Eq (3) to be his loss in a zero-sum game against Adversary (player 2) for characteristic j in the stratum h. The sampler seeks an allocation that is a good strategy for playing this game to minimize some vector for all (h = 1, 2, …, L). The vector space of strategies (allocations) which are available to the sampler is considered to be ν is; (4)

Therefore, the Adversary selects an independent sample from each strata according to an offered strategy by the sampler. The objective of the Adversary is to choose vector from each strata (1, 2, 3, …, L), which maximize, say;

A seemingly natural way to proceed which may lead to interesting results. The Adversary’s strategies are multi-objective goal program subject to maximize vector of with in each stratum for a particular nh, (1, 2, 3, …, L). The Adversary’s strategy space Δ can be described as; (5)

### Payoff matrix of Sampler (player 1)

While playing a zero sum game, each player try to optimize his gain or loss. The minimax idea is minimizing the possible loss for a worst case (maximum loss) scenario. A minimax strategy is a mixed strategy game. Both players choose alternate strategies and they make simultaneous moves. It can also been extended to more complex games.

Sampler would like to minimize vector , where zh is defined above. Payoff of sampler is the gain of Adversary, which can be determined by following multi-objective program; (6)

This can be equivalently written in a matrix Σν×Δ. Each row in Σ represents loss of sampler for a possible allocation and each column of Σ represents gain of Adversary for an offered strategy from sampler.

### Minimax game for allocation

Assume that the sampler and the Adversary each choose a strategy. This implies that the sampler will pick an allocation vector and the Adversary has to pick a sample of actual data according to Eq (6).

Adversary will choose a strategy that maximize zh = (z1h, z2h, z3h, …, zQh): ∀ h = 1, 2, …, L. Therefore, the sampler objective is to minimizes his maximum (worst) value within the available budget, while allocating sample of size n to all strata. However, obviously a larger sample will produce better result if there is no restriction on budget. The optimal program consider all possible choices of sample, where adversary can choose his strategies independently. In summary, the objective of the sampler under budgetary condition is; (7)

Theorem: In the game described above, i.e., (ν, Δ, CV2)

• A good strategy for Adversary is
• A good strategy for the sampler is .
• An optimal solution Z exists in the allocation problem game, as described in program Eq (7) where Z is the value of the game.

## Solution of the allocation game

The solution of the allocation problem can be formulated, as in previous section. The idea is to understand the structure of the problem that will enable us to extend it into more complex cases. Consider the sampler’s problem from Eq (7). For some , the inner maximization problem given in Eq (6) is solved using any suitable goal programming technique ([22], [23], [2831], [19, 34] and [35]).

The Adversary computes the maximum weighted sum from all characteristics (j = 1, 2, …, Q) using Eq (3) and model (6). This exercise is repeated for all strata (h = 1, 2, …, L).

We allow our generic goal program to have Q goals, which may be j = 1, …, Q. We determine nh decision variables. These are the factors over which the decision maker(s) may control and determine the decisions to be made. Each goal has an achieved value, zjh, on its underlying criterion. zjh is a function of the compromise decision variables for jth goal. The whole situation is expressed below: (8)

The above program can be expressed as a Weighted Goal Programming (WGP) if f1h, f2h, ⋯, fQh represent weighted functions in their respective priority. The WGP is formulated to maximize a composite objective function as a vector formed by a weighted sum of coefficients of variation in the respective strata.

The optimal strategy for the sampler is using model (7). This implies for any strategy that the sampler would choose, as the Adversary will sample from every strata to maximize the model (6). Therefore, it is a minimal sampling scheme.

## Numerical Illustration

This idea of sample selection is applied on a real data of Master of Philosophy (Table 1) induction into the department of Statistics, Quiad-e-Azam university Islamabad, Fall 2014. Stratum 1 compose on ‘other universities’ inductees and stratum 2 QAU graduates inductees. Data below represent the ‘test plus interview’ marks and ‘academic record’ marks. A stratified random sample is desired to be selected from the given data. The cost of selecting a sampling unit from stratum 1 is Rs. 2000 and from stratum 2 is Rs. 1000 (estimate of the traveling cost in local units, for sampling purpose only). Let we have a budget of Rs. 15000 only, and there is no initial cost on sampling i.e., C0 = 0.

### Computation of payoff matrix

We use the model (6) to compute payoff matrix. Let the two characteristics be the test and interview marks (T & I) and academic record marks (Ac. Rec.). Both have the equal importance because total marks considered are in the selection as the criteria. The above model (6) can be represented as;

We compute payoff matrix of sampler using equation below for various combinations of (n1, n2) that satisfy

The above formulation can be expressed as;

The problem arises where Adversary required a sample of actual values to maximize the sum of over all characteristics j = 1, 2 for . It is feasible under given cost, however, we choose a simulation technique for this purpose. We have sampled more than twice of the total possible samples . The population is known and finite, and sampling is done with replacement for the characteristic vector as well as for all possible sizes under the budgetary restriction. We are able to run a maximum loop on 20 × 106 randomly selected samples. This simulation process returns maximum value of sum of for both characteristics j = 1, 2 over the whole simulation loop. Results are given in Table 2. Our simulation technique is different from [36], where author simulates thousands of hypothetical populations to identify significant factors while selecting samples under different methodologies.

### Solution of minimax game

For the outer segment of model (7), we can use any suitable goal programming technique discussed in ([22, 23, 28], [2931], [34] and [35]). The above programme Eq (7) can be expressed as a Weighted Goal Programming (WGP) model as; where are sum of optimal Adversary’s objectives for stratum 1 and stratum 2, respectively. The objective function in the above equation can be written more precisely as , where Wh is hth stratum weight used in Eq (2). This program runs in general algebraic modeling system (GAMS) to get optimum results. Optimum results are highlighted in Table 2 and visualized in Fig 1.

### Discussion on results

We found for our referred example that the total weighted variation (sum over characteristics) in first stratum ranging from 0.0305 to 0.093, obviously lower for higher sample size. And the same in second stratum is 0.00105 to 0.0335. These results are based on simulation, which may differ on some other attempt [36]. We simulated the results for large number of samples (more than 20 millions in some cases). While comparing fluctuations in two characteristics, it is observed that results show high fluctuation in first characteristic (’test plus interview’ marks) as compare to second (’academic record’ marks) in either strata (see Table 2). The optimal value of this game is 0.01299275 with optimal sampling strategy (6, 3).

In literature, sample selection is frequently discussed when sampling frame is known. But our novel methodology is suitable even if sampling frame is unavailable. This addresses the adverse case scenario while our focus is generally on minimizing the estimates of variation. This sampling strategy shows another side of the picture.

Limitation of this study could be following. First, we have chosen weighted goal programming to solve inner problem of maximization to determine a payoff matrix of sampler. However, one can apply various other methods such as, lexicographic, extended lexicographic, fuzzy programming and the value function technique. Even results may be more interesting for different selection of weight criterion. Second, we use standard weight vector to solve minimax game, however, various other weight vector may be used for outer minimization problem.

## Acknowledgments

We are thankful to Dalia Bach, University of Columbia and Muhammad Faisal, University of Bradford, Faculty of Heath studies, Bradford UK, who helped us to improve the language of this paper. The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no RG-1437-027.

## Author Contributions

1. Conceptualization: YSM.
2. Data curation: YSM.
3. Formal analysis: YSM.
4. Funding acquisition: YSM.
5. Investigation: YSM.
6. Methodology: YSM.
8. Resources: AMS.
9. Software: IH.
10. Supervision: YSM.
11. Validation: IH.
12. Visualization: IH AMS.
13. Writing – original draft: YSM.
14. Writing – review & editing: IH AMS.

## References

1. 1. Andersen R, Kasper JD, Frankel MR. Total survey error. Jossey-Bass Publishers; 1979.
2. 2. Cochran WG. Sampling Techniques, New York: JohnWiley. Cochran3Sampling Techniques1977. 1977;.
3. 3. Groves RM. Survey errors and survey costs. vol. 536. John Wiley & Sons; 2004.
4. 4. Kish L. Survey Sampling. New York: John Wiley & Sons; 1965.
5. 5. Kish L. Optima and proxima in linear sample designs. Journal of the Royal Statistical Society Series A (General). 1976; p. 80–95.
6. 6. Linacre SJ, Trewin DJ. Total survey design-application to a collection of the construction industry. JOURNAL OF OFFICIAL STATISTICS-STOCKHOLM-. 1993;9:611–611.
7. 7. Blackwell DA, Girshick MA. Theory of games and statistical decisions. Courier Corporation; 1979.
8. 8. Aggarwal OP. Bayes and Minimax Procedures in Sampling From Finite and Infinite Populations–I. The Annals of Mathematical Statistics. 1959; p. 206–218.
9. 9. Ghosh JK. A game theory approach to the problem of optimum allocation in stratified sampling with multiple characters. Calcutta Statistical Association Bulletin. 1963;12:4–12.
10. 10. Kokan A, Khan S. Optimum allocation in multivariate surveys: An analytical solution. Journal of the Royal Statistical Society Series B (Methodological). 1967; p. 115–125.
11. 11. Ericson WA. Optimum stratified sampling using prior information. Journal of the American Statistical Association. 1965;60(311):750–771.
12. 12. Chatterjee S. Multivariate stratified surveys. Journal of the American Statistical Association. 1968;63(322):530–534.
13. 13. Yates F, et al. Sampling methods for censuses and surveys. Sampling methods for censuses and surveys. 1960;54.
14. 14. Xia CY, Meloni S, Moreno Y. Effects of environment knowledge on agglomeration and cooperation in spatial public goods games. Advances in Complex Systems. 2012;15(supp01):1250056.
15. 15. Xia CY, Meloni S, Perc M, Moreno Y. Dynamic instability of cooperation due to diverse activity patterns in evolutionary social dilemmas. EPL (Europhysics Letters). 2015;109(5):58002.
16. 16. Zhu Cj, Sun Sw, Wang L, Ding S, Wang J, Xia Cy. Promotion of cooperation due to diversity of players in the spatial public goods game with increasing neighborhood size. Physica A: Statistical Mechanics and its Applications. 2014;406:145–154.
17. 17. Xia C, Miao Q, Wang J, Ding S. Evolution of cooperation in the traveler’s dilemma game on two coupled lattices. Applied Mathematics and Computation. 2014;246:389–398.
18. 18. Chen Mh, Wang L, Sun Sw, Wang J, Xia Cy. Evolution of cooperation in the spatial public goods game with adaptive reputation assortment. Physics Letters A. 2016;380(1):40–47.
19. 19. Varshney R, Ahsan M, Khan MG. An optimum multivariate stratified sampling design with nonresponse: a lexicographic goal programming approach. Journal of Mathematical Modelling and Algorithms. 2011;10(4):393–405.
20. 20. Ansari A, Najmussehar AM, Ahsan M. On multiple response stratified random sampling design. J Stat Sci Kolkata, India. 2009;1(1):45–54.
21. 21. Bethel JW. An optimum allocation algorithm for multivariate surveys. US Department of Agriculture, Statistical Reporting Service, Statistical Research Division; 1986.
22. 22. Chromy JR. Design optimization with multiple objectives. Proceedings of the Section. 1987;.
23. 23. Dalenius T, Gurney M. The problem of optimum stratification. II. Scandinavian Actuarial Journal. 1951;1951(1–2):133–148.
24. 24. Dalenius T. Sampling in Sweden: contributions to the methods and theories of sample practice. Almqvist & Wiksell; 1957.
25. 25. Ghosh S. A note on stratified random sampling with multiple characters. Calcutta Statistical Association Bulletin. 1958;8(30–31):81–90.
26. 26. Folks JL, Antle CE. Optimum allocation of sampling units to strata when there are R responses of interest. Journal of the American Statistical Association. 1965;60(309):225–233.
27. 27. Jahan N, Khan M, Ahsan M. A generalized compromise allocation. Journal of the Indian Statistical Association. 1994;32:95–101.
28. 28. Jahan N, Khan M, Ahsan M. Optimum compromise allocation using dynamic programming. Dhaka Univ J Sci. 2001;49(2):197–202.
29. 29. Khan E, Khan MG, Ahsan M. Optimum stratification: a mathematical programming approach. Calcutta Statistical Association Bulletin. 2002;52:323–333.
30. 30. Khan MG, Khan E, Ahsan M. Optimum allocation in multivariate stratified sampling in presence of non-response. J Ind Soc Agril Statist. 2008;62(1):42–48.
31. 31. Khan MG, Maiti T, Ahsan M. An optimal multivariate stratified sampling design using auxiliary information: an integer solution using goal programming approach. Journal of official Statistics. 2010;26(4):695.
32. 32. Varshney R, Ahsan M, et al. Estimation of more than one parameters in stratified sampling with fixed budget. Mathematical Methods of Operations Research. 2012;75(2):185–197.
33. 33. Swindel BF, Yandle DO. Allocation in stratified sampling as a game. Journal of the American Statistical Association. 1972;67(339):684–686.
34. 34. Romero C. Extended lexicographic goal programming: a unifying approach. Omega. 2001;29(1):63–71.
35. 35. Tamiz M, Jones D, Romero C. Goal programming for decision making: An overview of the current state-of-the-art. European Journal of operational research. 1998;111(3):569–581.
36. 36. Kozak M. On sample allocation in multivariate surveys. Communications in Statistics—Simulation and Computation. 2006;35(4):901–910.