The Efficiency of Split Panel Designs in an Analysis of Variance Model

We consider split panel design efficiency in analysis of variance models, that is, the determination of the cross-sections series optimal proportion in all samples, to minimize parametric best linear unbiased estimators of linear combination variances. An orthogonal matrix is constructed to obtain manageable expression of variances. On this basis, we derive a theorem for analyzing split panel design efficiency irrespective of interest and budget parameters. Additionally, relative estimator efficiency based on the split panel to an estimator based on a pure panel or a pure cross-section is present. The analysis shows that the gains from split panel can be quite substantial. We further consider the efficiency of split panel design, given a budget, and transform it to a constrained nonlinear integer programming. Specifically, an efficient algorithm is designed to solve the constrained nonlinear integer programming. Moreover, we combine one at time designs and factorial designs to illustrate the algorithm’s efficiency with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, and the efficient ranges of the algorithm parameters are given to ensure a good solution.


Introduction
Split panel combines advantages of other sampling methods (including repeated, cross-sectional, and rotating sample) and provides us with rich, convenient, and practical information, being widely applied in many fields [1]. In experiments with economic principles survey, researchers typically consider statistical models that allow complex relationships. Due to the complex statistical model requirements, data should be collected to estimate the statistical model parameters. Being a new sample type that combines the advantages of the other three basic samples, split panel is used to provide rich data for complex statistical model. Since wide application of micro-economic data, panel conditioning, and panel nonresponse become more important in econometrics, as well documented in literature [1], split panel, as a combination of a panel and a repeated or rotating panel, uses changing samples to recruit from replacements for panel conditioning and panel nonresponse [2][3]. In most fields, such as finance, labor economics, and political economy, the collection of data is characterized by high costs. Split panel has the advantages of the flexibility of the cross-section change sizes and continually updating information [4][5]. Therefore, it is very important to use split panel and the optimal sample design to obtain as much information as possible from a given budget. In recent years, the theory of split panel sample has witnessed theoretical advances and applications across disciplines of pure and applied sciences, and it will be widely used in the future [6][7]. However, limited attention has been paid to the analysis of split panel design efficiency recently. In the early literature, the estimation of a time-dependent mean from several kinds of rotating samples, that is, the special form of split panel and the resulting variances have been examined by Patterson [8] and Eckler [9]. It has been documented that the optimal design of the sample depends on the parameter of interest (see [10], pp. 152). On this basis, Nijman et al. [11] determined the optimal split panel design, that is, how to choose the optimal proportion of a given budget that can be spent on the collection of a series of cross-sections to minimize split panel design efficiency. However, in sampling, we need to obtain the optimal proportion of a series of cross-sections in all samples, and it cannot be obtained accurately by the proportion of the budget that can be spent on the collection of a series of cross-sections [11]. Consequently, we cannot save sampling costs according to [11]. On the other hand, the split panel design optimization algorithm is not given in [11]. For researchers and practitioners to solve for the optimal split panel design, they need to select or design the appropriate optimization algorithm and to calculate the optimal proportion by the optimization theory. In sampling, this will decrease the efficiency of calculating the optimal proportion and reduce the accuracy of the solution. Hence, it is not attractive to design split panel in the research framework of [11].
In this paper, the goal is to minimize the efficiency of split panel design in the analysis-ofvariance model, when one needs to determine the optimal proportion of a series of cross-sections in all samples, when the optimal proportion of a series of cross-sections can be applied directly in sampling. This is an extension of [11], and the main contributions can be summarized as follows. First, we show how to choose the proportion of a series of cross-sections, in all samples, to minimize the variances of estimators in the analysis-of-variance model, irrespective of the parameters of interest and budget. In particular, we present the relative efficiency of estimators, based on the split panel to an estimator based on a pure panel or a pure cross-section. Second, we transform the efficiency of split panel design under a budget constraint for the nonlinear integer optimization (difficult to solve by mature optimal algorithms). The simulated annealing algorithm has the advantages of guaranteeing global optimization, selecting the initial solution randomly, and being simple and practical [12]. However, when the simulated annealing algorithm is used to solve the constrained nonlinear integer optimization associated with the efficiency of split panel design under a budget constraint, it is difficult to combine parameters, such as the inner iteration number, the initial temperature, and the temperature decrease rate, in order to get the best performance of the algorithm. Hitherto, there is no theoretical method to solve this problem. Therefore, in this paper we design an efficient algorithm, based on the simulated annealing algorithm, to solve the constrained nonlinear integer optimization of the split panel design efficiency under budget constraint. In the context of numerical modeling, sensitivity analysis studies how different values of an independent variable impact a particular dependent variable, under a given set of assumptions. It has been widely applied to many fields such economics, engineering, ecology, etc. The modelers can determine, by sensitivity analysis, whether the parameters of the model or algorithm give reliable predictions. Hence, third, we introduce sensitivity analysis to appraise the parameters of the proposed algorithm. The one-at-a-time design (OATD) method is one of the most common approaches for the effect on the output [13][14][15][16] and it is frequently used as the modeler immediately knows which input factor is responsible for the failure, in case of model failure [17]. Yet, the OATD method cannot be used if two factors are interdependent, because it only studies the effect of one variable at a time. The factorial design (FD) method, which is used to study the effects several factors have on a response, and the interactions between the factors for varying levels of all factors at the same time, is different from the OATD method. As such, the OATD and FD methods are chosen to analyze the effect of parameters and compensate the deficiency of a single method [16]. On the other hand, the simulated annealing algorithm has no special requirement and its performance cannot be changed with different examples [12]. Hence, with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, we combine the OATD and FD methods to analyze the algorithm designed in this paper. The result are the efficient ranges of the algorithm parameters are a good solution (i.e., the accurate optimal proportion). Therefore, the research results in this paper would be useful to both researchers and practitioners in sampling.
This paper is organized as follows: section 2, based on the analysis-of-variance model, transforms the efficiency of split panel design into a nonlinear optimization; in section 3, the efficiency of split panel design, irrespective of interest and budget, is discussed; in section 4, we consider the efficiency of split panel design given a budget constraint and design an efficient algorithm based on simulated annealing to solve the resulting constrained nonlinear integer optimization; section 5 combines the OATD and FD methods to illustrate the algorithm's efficiency, with an empirical example of food monthly consumer expenditure in 1985, in the Netherlands, and the efficient ranges of the algorithm parameters are given to ensure a good solution; and section 6 concludes the paper.

Theoretical results of parameter estimators variances
In this paper, we consider the split panel design efficiency by minimizing the best linear estimator variance of the linear combinations P T t¼1 t b _ t of the period means b _ t in the analysis of the variance model: where i = 1,. . .,N, t = 1,. . .,T and ϕ 0 = (ϕ 1 ,ϕ 2 . . .,ϕ T ), the α i and ε it are independent and identically distributed (i.i.d.) random variables with mean 0 and variances s 2 a and s 2 ε , respectively, which are mutually independent. Throughout this paper we assumed that the parameters s 2 a and s 2 ε are a priori known, for simplicity. If these parameters are unknown, the consistent estimators can be used in their place and the same results hold true asymptotically [18]. Important special cases are the determination of the optimal design if the parameter of interest is the period mean β t , if the parameter of interest is the change in two subsequent period means β t −β t−1 , or if the parameter of interest is the overall average of the period means We denote the sample size in each wave by N and the proportion of cross-sections in all samples by λN, while the remaining (1−λ)N individuals will be re-interviewed every period. In order to determine the optimal value of λ (i.e., the proportion of cross-sections in all samples) we first derive the efficient estimator and its variance. It is well known that the estimator of β 0 = (β 1 ,β 2 ,. . .,β T ) in Eq (1), using only the information on individuals which are re-interviewed every period, is the best linear unbiased estimator and regarded as b _ p [11]. Analogously, the estimator based on the cross-section information only is also the best linear unbiased estimator cs are independent, based on the relative theory of two sample estimation [10,11], the best linear unbiased estimator which uses all the samples is given by it is easily verified that Consequently, the efficiency of split panel design could be transformed into the following nonlinear optimization by minimizing the variance of the best unbiased estimator of 0 b In order to obtain the optimal solution for λ from Eq (9), the main steps are discussed in the derivation of the manageable expression for the variance of the best unbiased estimator 0 b _ in Eq (1). First, V À1 p and V À1 cs can be written as and Since V À1 cs is a constant and multiple of the identity matrix, and V À1 p is symmetric, there exists an orthogonal matrix Q such that Q T V À1 where D is a diagonal matrix and written as ; ð14Þ and the orthogonal matrix Q can be written as : ð15Þ The proof of constructing the orthogonal matrix Q is presented in the Appendix A1.
As such, the variance of the best unbiased estimator of 0 b _ using all the samples is written as : ð17Þ We denote ϕ 0 Q = (δ 1 ,. . .,δ T ) = δ 0 to obtain the simple expression of Eq (16), and rewrite it as Consequently, the nonlinear optimization by minimizing the variance of the best unbiased Split panel design efficiency irrespective of the parameters of interest and budget By considering the linear combinations of vector β, we can then easily adapt the results to an individual element, difference of elements, or overall average. As such, in this section, we will derive a theorem for the split panel design efficiency, irrespective of the parameters of interest and budget, using Eq (19). Theorem 1 Pure panel (λ = 0) will minimize the variance of the best unbiased estimator of pure series of cross-sections (λ = 1) will minimize the variance of the best unbiased estimator split panel (λ = k r ) will minimize the variance of the best unbiased estimator of 0 b _ , irrespective of the choice of ϕ, split panel (λ = k l ) will minimize the variance of the best unbiased estimator of 0 b _ . irrespective of the choice of ϕ, split panel (λ = λ 0 ) will minimize the variance of the best unbiased estimator of where k l is the left root of k(λ); k r is the right root of k(λ), If if MðÀ c b Þ > Mð1Þ; l 0 ¼ 1; And b ¼ 2r 2 ðT À 1Þ ð1 À rÞ The proof of theorem 1 is presented in Appendix 1B.
From Theorem 1, it can be easily checked that b _ t À b _ tÀ1 has the smallest variance if a pure panel (λ = 0) is used. Likewise, a pure series of cross-sections (λ = 1) will be optimal if the overall average of period means is to be estimated.
In order to illustrate that the split panel design will be preferable to pure panel or pure series of cross-sections design in most cases, and how much efficiency will be lost if a suboptimal choice is made when the period mean β t is the parameter of interest, we present in Table 1 the relative efficiency of the estimator based on the split panel to an estimator based on a pure series of cross-sections or pure panel (pure series of cross-sections and pure panel yield equally efficient estimators in this case). Similar to [18], we assume the observation period T = 3,6,12,20, the proportion of the component of variance ρ = 0.3,0.6,0.9, and the proportions of a series of cross-sections in split panel λ = 1/2,1/3,1/4,1/8,1/12.

Split panel design efficiency with budget constraint and main algorithm
As opposed to the previous section, where we have analyzed split panel design efficiency, irrespective of the parameters of interest and budget, in this section, split panel design efficiency with a given budget is considered. Let p 1 denote the average cost of observing every individual in cross-sections and p 2 the average cost of observing every individual in panels. The cost of a cross-sectional survey is 30% to 70% higher than an additional wave of the panel study of income dynamics, as shown by Duncan et al. [19]. Therefore, we obtain 0:6 < p 2 p 1 < 0:8. If there is a budget,C, for all the periods, we can obtain the constrained nonlinear integer optimization (P1), as follows min zðl; NÞ ¼ N À1 0 flV À1 cs þ ð1 À lÞV À1 p g À1 ; ð31Þ s:t: lNp 1 þ ð1 À lÞNp 2 C Ã ; ð32Þ where C Ã ¼ C T . Applying Eq (19) and λN = x, N = y, we obtain the constrained nonlinear integer optimization (P2): x; y 2 N Ã ; where C Ã ¼ C T . Eq (35) is the objective function that minimizes the variance of the best linear unbiased estimator of linear combinations of the period means, while Eq (36) satisfies the constraint of a given budget.
Algorithm design. In section 4, we transformed the efficiency of split panel design into the constrained nonlinear integer optimization (P2), which is, nonetheless, difficult to solve with the current mature optimal algorithms. The simulated annealing algorithm has the advantages of guaranteeing global optimization, selecting the initial solution randomly, while being simple and practical. However, when it is used to solve the constrained nonlinear integer optimization associated with the efficiency of split panel design, given a budget, it is difficult to combine the parameters, such as the inner iteration number, the initial temperature, and the temperature decrease rate, in order to get the best performance of the algorithm. Consequently, in this paper, we design an efficient algorithm to solve the constrained nonlinear integer optimization associated with split panel design efficiency of given a budget, based on the simulated annealing algorithm.
The steps of the simulated annealing algorithm designed to solve (P2) are given as follows: 1. Choose the initial integer solution x 0 2 D and the initial temperature value T 0 >0, where D is a feasible region formed by Eqs (4) and (5); calculate f(x 0 ) and let 2. Randomly generate the integer vector where and z K i is the ith component of the random vector Z K ; U 1 ,U 2 ,. . .,U n is a group of random variables distributed uniformly over [−1,1], which are independent each other; sign(Á) is the sign function; and hÁi is the symbol of rounding numbers.
3. Use the current iteration point x K and the random vector Z K to generate a new iteration point Y K that satisfies Y K = X K +Z K . If Y K 2D, the next step is carried on, and if Y K = 2D, Y K is calculated by until Y K 2D and to the next step, where l = 1,2,. . .N 1 . If Y K = 2D in the N 1 steps, let Y K = X K and move to the next step.
4. Generate a random number η distributed uniformly over [0,1] and calculate using the current iteration point X K and a new iteration point, Y K .

If
Z P a ðY K jX K ; T K Þ; ð44Þ or let 6. If let 7. If the stopping criterion satisfies stop calculating and regard X min and f min as the approximate global optimal solution and the corresponding optimal value, respectively. If not, move to the next step.
8. Generate a new temperature T K+1 by using the given renewed function of temperature as follows: and let K = K+1 and shift to the second step.
The detailed design process of simulated annealing algorithm is presented in Appendix 1C

Results and Discussion
The example used in this study is the monthly consumer expenditure on food in 1985, in the Netherlands, which is modeled using Model (1) and the so-called expenditure index panel conducted by Infomart, a marketing research agency [11]. We restrict analysis to ξ 1 = ξ 2 = . . . = ξ 12 (annual average). The maximum likelihood estimate of ρ in Eq (1) for food is 0.76, with standard error 0.005 [11]. From [19], the survey cost was estimated to be roughly USD 513,000. The average cost of observing every individual in cross-section p 1 and the average cost of observing every individual in panels p 2 were estimated to be roughly USD 125 and USD 75, respectively. The following results are obtained using MATLAB.

OATD method
The benchmarking parameter combinations of the algorithm designed in this paper are set as follows: the inner iteration number B = 2000, the initial temperature E 0 = 10000, the temperature decrease rate m = 0.75 and the termination temperature ε = 0.0001. Subsequently, we analyze the effects of these parameters. First, we set the inner iteration number B, the temperature decrease rate m, the termination temperature ε, and the initial temperature E 0 is changed from 1 to 1,000,000. The optimal proportion values and the corresponding objective values are shown in Fig 1. From Fig 2, the objective values fluctuate between 0.005 and 0.015 and, when the initial temperature is more than 100,000, the optimal proportion values and the corresponding objective values can converge to the optimal values of 0.2 and 0.005, respectively. Therefore, the initial temperature can be chosen between 100,000 and 1,000,000.
Second, we set the initial temperature E 0 , the temperature decrease rate m, the termination temperature ε, and the inner iteration number B ÀbAE ffiffiffiffiffiffiffiffiffi b 2 À4ac p 2a are changed from 500 to 3,000 to ensure that the algorithm reaches the balanced state. The optimal proportion values and the corresponding objective values are shown in Fig 3. From Fig 3, the higher the inner iteration number is, the more easily the algorithm moves from the local optimal value and converges to the global optimal value. Conversely, the higher the inner iteration number is, the longer the implementation time. In this study, the implementation time based on the set of parameters is in 10 minutes. As such, we do not consider that the inner iteration numbers increases The Efficiency of Split Panel Designs implementation time. From Fig 4, when the inner iteration number is more than 2,000, the objective values and the optimal proportion values converge to 0.007 and 0.2, respectively. Therefore, the inner iteration number can be chosen between 2,000 and 2,500.
Subsequently, we set the initial temperature E 0 , the termination temperature ε, the inner iteration number B, and the temperature decrease rate m is changed from 0.45 to 0.9. The optimal proportion values and the corresponding objective values are shown in Fig 5. If the temperature decrease rate is less than 0.6, the algorithm falls into the local optimal value. When the temperature decrease rate is more than 0.6, the algorithm is above the local optimal value and converges to the global optimal value. As is shown in Figs 5 and 6, the optimal proportion values and the corresponding objective values converge to 0.2 and 0.0046, respectively. The temperature decrease rate determines the searching space and the larger the temperature decrease  The Efficiency of Split Panel Designs rate is, the greater the searching space. Consequently, we can choose the temperature decrease rate between 0.75 and 0.9.
Finally, we set the initial temperature E 0 , the temperature decrease rate m, the inner iteration number B, and the termination temperature ε is changed from 0.00001 to 1. Figs 7 and 8 show that when the termination temperature ε is near 0.0001, the optimal proportion values and the corresponding objective values converge to 0.2 and 0.0043, respectively. The lower the  The Efficiency of Split Panel Designs termination temperature is, the more adequate the time to converge to the optimal value. Therefore, the termination temperature can be chosen between 0.0001 and 0.00001.

FD method
According to the above discussion, we separate the algorithm designed in this paper into two cases to analyze, the best and worst case. Under the two different cases, the fields of parameters based on the above OATD analysis can be separated into two levels (+ and -) as shown in Table 2.
For each case, the temperature decrease rate m, the inner iteration number B and the termination temperature ε are chosen randomly from their two parameter levels, and there are six values in all. Under each parameter combination, we take the optimal value 10 times the average value. The initial temperature is considered constant as compared to the whole field, since the local optimal value because of the initial temperature is very small, and the resultant errors  from the initial temperature are smaller than that from other parameters. Comparing to other parameters, the initial temperature has a low quantitative influence on the algorithm designed in this paper, and will not be analyzed as such.
It is found as per Table 3 that the effects of the temperature decrease rate m and the inner iteration number B are clear, meaning that the choice of the two parameters decides whether the algorithm can obtain the optimal value. In different cases, the temperature decrease rate and the inner iteration number have different effects on the optimal value. In the best case, the inner iteration number is large enough to guarantee the solutions are stable and, if it continues to increase, the solutions do not improve much. In this case, the adjustment of temperature decrease rate can continue to narrow the neighborhood range for more convergent results. Therefore, the temperature decrease rate influences the best case. In the worst case, the inner iteration number has a significant impact, since the lower inner iteration number causes a wider searching range, which makes the process of searching far from the optimal solution. This time, the result is easier to move to the local optimal solution with the adjustment of the temperature decrease. Furthermore, Table 3 shows that the combination effect of the temperature decrease rate and the inner iteration number is also clear.

Conclusions
In this paper, we discuss how to determine the optimal proportion of a series of cross-sections in all samples to minimize survey design efficiency in the analysis-of-variance model, which can be applied directly in sampling. First, we derive a theorem for choosing the optimal proportion of a series of cross-sections in all samples, irrespective of the parameters of interest and budget. In addition, our results show that, compared to a pure series of cross-sections or pure panel, the gains from choosing split panel can be substantial. Second, the efficiency of split panel design given a budget is considered and an efficient algorithm is designed to solve the constrained nonlinear integer optimization associated with the efficiency of survey designs on a budget. We further apply OATD and FD methods to analyze and compare the quantitative influence of different selections of parameters in the implementation of the algorithm with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, and obtain the efficient ranges of the algorithm parameters to ensure a good solution.
For further research, we will extend the results to a more general analysis of the covariance model and derive the expressions for the variances of efficient parameter estimators. At the same time, other algorithms can be to solve the new nonlinear programming from the optimal split design in the analysis of the covariance model. fðB À bÞðB À bÞ T ðB À bÞðB À bÞ T À2½ðB; BÞ À ðB; bÞðB À bÞðB À bÞ T þ ½ðB; BÞ À ðB; bÞ 2 g ¼ I Therefore, matrix Q is an orthogonal matrix.