The Efficiency of Split Panel Designs in an Analysis of Variance Model

Xin Liu; Wei-Guo Wang; Hai-Jun Liu

doi:10.1371/journal.pone.0154913

Abstract

We consider split panel design efficiency in analysis of variance models, that is, the determination of the cross-sections series optimal proportion in all samples, to minimize parametric best linear unbiased estimators of linear combination variances. An orthogonal matrix is constructed to obtain manageable expression of variances. On this basis, we derive a theorem for analyzing split panel design efficiency irrespective of interest and budget parameters. Additionally, relative estimator efficiency based on the split panel to an estimator based on a pure panel or a pure cross-section is present. The analysis shows that the gains from split panel can be quite substantial. We further consider the efficiency of split panel design, given a budget, and transform it to a constrained nonlinear integer programming. Specifically, an efficient algorithm is designed to solve the constrained nonlinear integer programming. Moreover, we combine one at time designs and factorial designs to illustrate the algorithm’s efficiency with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, and the efficient ranges of the algorithm parameters are given to ensure a good solution.

Citation: Liu X, Wang W-G, Liu H-J (2016) The Efficiency of Split Panel Designs in an Analysis of Variance Model. PLoS ONE 11(5): e0154913. https://doi.org/10.1371/journal.pone.0154913

Editor: Boris Podobnik, University of Rijeka, CROATIA

Received: November 21, 2014; Accepted: April 21, 2016; Published: May 10, 2016

Copyright: © 2016 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the References section of the manuscript or are owned by Informart, a market research agency. Interested researchers may contact http://www.sciencedirect.com/science/article/pii/030440769090013J for access to data from Informat.

Funding: This work is supported financially by talent scientific research fund of LSHU, Social Science Foundation of China (No. 15BKS073), Natural Science Foundation of China (No. 71171035, 71271045, 71571033, 11301060), the Program for Discipline Construction in DUFF (No. XKT-201411) and Outstanding scientific innovation talents program of DUFE (no.DUFE2014R20). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Split panel combines advantages of other sampling methods (including repeated, cross-sectional, and rotating sample) and provides us with rich, convenient, and practical information, being widely applied in many fields [1]. In experiments with economic principles survey, researchers typically consider statistical models that allow complex relationships. Due to the complex statistical model requirements, data should be collected to estimate the statistical model parameters. Being a new sample type that combines the advantages of the other three basic samples, split panel is used to provide rich data for complex statistical model. Since wide application of micro-economic data, panel conditioning, and panel nonresponse become more important in econometrics, as well documented in literature [1], split panel, as a combination of a panel and a repeated or rotating panel, uses changing samples to recruit from replacements for panel conditioning and panel nonresponse [2–3]. In most fields, such as finance, labor economics, and political economy, the collection of data is characterized by high costs. Split panel has the advantages of the flexibility of the cross-section change sizes and continually updating information [4–5]. Therefore, it is very important to use split panel and the optimal sample design to obtain as much information as possible from a given budget. In recent years, the theory of split panel sample has witnessed theoretical advances and applications across disciplines of pure and applied sciences, and it will be widely used in the future [6–7]. However, limited attention has been paid to the analysis of split panel design efficiency recently. In the early literature, the estimation of a time-dependent mean from several kinds of rotating samples, that is, the special form of split panel and the resulting variances have been examined by Patterson [8] and Eckler [9]. It has been documented that the optimal design of the sample depends on the parameter of interest (see [10], pp. 152). On this basis, Nijman et al. [11] determined the optimal split panel design, that is, how to choose the optimal proportion of a given budget that can be spent on the collection of a series of cross-sections to minimize split panel design efficiency. However, in sampling, we need to obtain the optimal proportion of a series of cross-sections in all samples, and it cannot be obtained accurately by the proportion of the budget that can be spent on the collection of a series of cross-sections [11]. Consequently, we cannot save sampling costs according to [11]. On the other hand, the split panel design optimization algorithm is not given in [11]. For researchers and practitioners to solve for the optimal split panel design, they need to select or design the appropriate optimization algorithm and to calculate the optimal proportion by the optimization theory. In sampling, this will decrease the efficiency of calculating the optimal proportion and reduce the accuracy of the solution. Hence, it is not attractive to design split panel in the research framework of [11].

In this paper, the goal is to minimize the efficiency of split panel design in the analysis-of-variance model, when one needs to determine the optimal proportion of a series of cross-sections in all samples, when the optimal proportion of a series of cross-sections can be applied directly in sampling. This is an extension of [11], and the main contributions can be summarized as follows. First, we show how to choose the proportion of a series of cross-sections, in all samples, to minimize the variances of estimators in the analysis-of-variance model, irrespective of the parameters of interest and budget. In particular, we present the relative efficiency of estimators, based on the split panel to an estimator based on a pure panel or a pure cross-section. Second, we transform the efficiency of split panel design under a budget constraint for the nonlinear integer optimization (difficult to solve by mature optimal algorithms). The simulated annealing algorithm has the advantages of guaranteeing global optimization, selecting the initial solution randomly, and being simple and practical [12]. However, when the simulated annealing algorithm is used to solve the constrained nonlinear integer optimization associated with the efficiency of split panel design under a budget constraint, it is difficult to combine parameters, such as the inner iteration number, the initial temperature, and the temperature decrease rate, in order to get the best performance of the algorithm. Hitherto, there is no theoretical method to solve this problem. Therefore, in this paper we design an efficient algorithm, based on the simulated annealing algorithm, to solve the constrained nonlinear integer optimization of the split panel design efficiency under budget constraint. In the context of numerical modeling, sensitivity analysis studies how different values of an independent variable impact a particular dependent variable, under a given set of assumptions. It has been widely applied to many fields such economics, engineering, ecology, etc. The modelers can determine, by sensitivity analysis, whether the parameters of the model or algorithm give reliable predictions. Hence, third, we introduce sensitivity analysis to appraise the parameters of the proposed algorithm. The one-at-a-time design (OATD) method is one of the most common approaches for the effect on the output [13–16] and it is frequently used as the modeler immediately knows which input factor is responsible for the failure, in case of model failure [17]. Yet, the OATD method cannot be used if two factors are interdependent, because it only studies the effect of one variable at a time. The factorial design (FD) method, which is used to study the effects several factors have on a response, and the interactions between the factors for varying levels of all factors at the same time, is different from the OATD method. As such, the OATD and FD methods are chosen to analyze the effect of parameters and compensate the deficiency of a single method [16]. On the other hand, the simulated annealing algorithm has no special requirement and its performance cannot be changed with different examples [12]. Hence, with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, we combine the OATD and FD methods to analyze the algorithm designed in this paper. The result are the efficient ranges of the algorithm parameters are a good solution (i.e., the accurate optimal proportion). Therefore, the research results in this paper would be useful to both researchers and practitioners in sampling.

This paper is organized as follows: section 2, based on the analysis-of-variance model, transforms the efficiency of split panel design into a nonlinear optimization; in section 3, the efficiency of split panel design, irrespective of interest and budget, is discussed; in section 4, we consider the efficiency of split panel design given a budget constraint and design an efficient algorithm based on simulated annealing to solve the resulting constrained nonlinear integer optimization; section 5 combines the OATD and FD methods to illustrate the algorithm’s efficiency, with an empirical example of food monthly consumer expenditure in 1985, in the Netherlands, and the efficient ranges of the algorithm parameters are given to ensure a good solution; and section 6 concludes the paper.

Materials and Methods

Theoretical results of parameter estimators variances

In this paper, we consider the split panel design efficiency by minimizing the best linear estimator variance of the linear combinations of the period means in the analysis of the variance model: (1) where i = 1,…,N, t = 1,…,T and ϕ′ = (ϕ₁,ϕ₂…,ϕ_T), the α_i and ε_it are independent and identically distributed (i.i.d.) random variables with mean 0 and variances and , respectively, which are mutually independent. Throughout this paper we assumed that the parameters and are a priori known, for simplicity. If these parameters are unknown, the consistent estimators can be used in their place and the same results hold true asymptotically [18]. Important special cases are the determination of the optimal design if the parameter of interest is the period mean β_t, if the parameter of interest is the change in two subsequent period means β_t−β_t−1, or if the parameter of interest is the overall average of the period means .

We denote the sample size in each wave by N and the proportion of cross-sections in all samples by λN, while the remaining (1−λ)N individuals will be re-interviewed every period. In order to determine the optimal value of λ (i.e., the proportion of cross-sections in all samples) we first derive the efficient estimator and its variance. It is well known that the estimator of β′ = (β₁,β₂,…,β_T) in Eq (1), using only the information on individuals which are re-interviewed every period, is the best linear unbiased estimator and regarded as [11]. Analogously, the estimator based on the cross-section information only is also the best linear unbiased estimator and regarded as [11], and that (2) (3) where J denotes the number of observed individuals. Therefore (4) (5) where l_T = (1,…,1)_T×1′ and , .

Since and are independent, based on the relative theory of two sample estimation [10, 11], the best linear unbiased estimator which uses all the samples is given by (6)

For (7) it is easily verified that (8)

Consequently, the efficiency of split panel design could be transformed into the following nonlinear optimization by minimizing the variance of the best unbiased estimator of (9)

In order to obtain the optimal solution for λ from Eq (9), the main steps are discussed in the derivation of the manageable expression for the variance of the best unbiased estimator in Eq (1). First, and can be written as (10) and (11) where (12) and (13)

Since is a constant and multiple of the identity matrix, and is symmetric, there exists an orthogonal matrix Q such that and , where D is a diagonal matrix and written as (14) and the orthogonal matrix Q can be written as (15)

The proof of constructing the orthogonal matrix Q is presented in the Appendix A1.

As such, the variance of the best unbiased estimator of using all the samples is written as (16) where (17)

We denote ϕ′Q = (δ₁,…,δ_T) = δ′ to obtain the simple expression of Eq (16), and rewrite it as (18)

Consequently, the nonlinear optimization by minimizing the variance of the best unbiased estimator of can be rewritten as (19)

Split panel design efficiency irrespective of the parameters of interest and budget

By considering the linear combinations of vector β, we can then easily adapt the results to an individual element, difference of elements, or overall average. As such, in this section, we will derive a theorem for the split panel design efficiency, irrespective of the parameters of interest and budget, using Eq (19).

Theorem 1 Pure panel (λ = 0) will minimize the variance of the best unbiased estimator of , irrespective of the choice of ϕ, (20) pure series of cross-sections (λ = 1) will minimize the variance of the best unbiased estimator of , irrespective of the choice of ϕ, (21) split panel (λ = k_r) will minimize the variance of the best unbiased estimator of , irrespective of the choice of ϕ, (22) split panel (λ = k_l) will minimize the variance of the best unbiased estimator of . irrespective of the choice of ϕ, (23) split panel (λ = λ₀) will minimize the variance of the best unbiased estimator of , irrespective of the choice of ϕ, (24) where k_l is the left root of k(λ); k_r is the right root of k(λ), (25)

If (26) if (27) And (28) (29) (30)

The proof of theorem 1 is presented in Appendix 1B.

From Theorem 1, it can be easily checked that has the smallest variance if a pure panel (λ = 0) is used. Likewise, a pure series of cross-sections (λ = 1) will be optimal if the overall average of period means is to be estimated.

In order to illustrate that the split panel design will be preferable to pure panel or pure series of cross-sections design in most cases, and how much efficiency will be lost if a suboptimal choice is made when the period mean β_t is the parameter of interest, we present in Table 1 the relative efficiency of the estimator based on the split panel to an estimator based on a pure series of cross-sections or pure panel (pure series of cross-sections and pure panel yield equally efficient estimators in this case). Similar to [18], we assume the observation period T = 3,6,12,20, the proportion of the component of variance ρ = 0.3,0.6,0.9, and the proportions of a series of cross-sections in split panel λ = 1/2,1/3,1/4,1/8,1/12.

Download:

Table 1. The relative efficiency compared to pure cross-sections (or pure panel) for the estimator

.

https://doi.org/10.1371/journal.pone.0154913.t001

Split panel design efficiency with budget constraint and main algorithm

As opposed to the previous section, where we have analyzed split panel design efficiency, irrespective of the parameters of interest and budget, in this section, split panel design efficiency with a given budget is considered. Let p₁ denote the average cost of observing every individual in cross-sections and p₂ the average cost of observing every individual in panels. The cost of a cross-sectional survey is 30% to 70% higher than an additional wave of the panel study of income dynamics, as shown by Duncan et al. [19]. Therefore, we obtain . If there is a budget,C, for all the periods, we can obtain the constrained nonlinear integer optimization (P1), as follows (31) (32) (33) (34) where .

Applying Eq (19) and λN = x, N = y, we obtain the constrained nonlinear integer optimization (P2): (35) (36) (37) (38) where .

Eq (35) is the objective function that minimizes the variance of the best linear unbiased estimator of linear combinations of the period means, while Eq (36) satisfies the constraint of a given budget.

Algorithm design.

In section 4, we transformed the efficiency of split panel design into the constrained nonlinear integer optimization (P2), which is, nonetheless, difficult to solve with the current mature optimal algorithms. The simulated annealing algorithm has the advantages of guaranteeing global optimization, selecting the initial solution randomly, while being simple and practical. However, when it is used to solve the constrained nonlinear integer optimization associated with the efficiency of split panel design, given a budget, it is difficult to combine the parameters, such as the inner iteration number, the initial temperature, and the temperature decrease rate, in order to get the best performance of the algorithm. Consequently, in this paper, we design an efficient algorithm to solve the constrained nonlinear integer optimization associated with split panel design efficiency of given a budget, based on the simulated annealing algorithm.

The steps of the simulated annealing algorithm designed to solve (P2) are given as follows:

Choose the initial integer solution x₀ ∈ D and the initial temperature value T₀>0, where D is a feasible region formed by Eqs (4) and (5); calculate f(x₀) and let (39)
Randomly generate the integer vector (40) where (41) and is the ith component of the random vector Z^K; U₁,U₂,…,U_n is a group of random variables distributed uniformly over [−1,1], which are independent each other; sign(⋅) is the sign function; and 〈⋅〉 is the symbol of rounding numbers.
Use the current iteration point x^K and the random vector Z^K to generate a new iteration point Y^K that satisfies Y^K = X^K+Z^K. If Y^K∈D, the next step is carried on, and if Y^K∉D, Y^K is calculated by (42) until Y^K∈D and to the next step, where l = 1,2,…N₁. If Y^K∉D in the N₁ steps, let Y^K = X^K and move to the next step.
Generate a random number η distributed uniformly over [0,1] and calculate (43) using the current iteration point X^K and a new iteration point, Y^K.
If (44) let (45) or let (46)
If (47) let (48)
If the stopping criterion satisfies (49) stop calculating and regard X_min and f_min as the approximate global optimal solution and the corresponding optimal value, respectively. If not, move to the next step.
Generate a new temperature T_K+1 by using the given renewed function of temperature as follows: (50) and let K = K+1 and shift to the second step.

The detailed design process of simulated annealing algorithm is presented in Appendix 1C

Results and Discussion

The example used in this study is the monthly consumer expenditure on food in 1985, in the Netherlands, which is modeled using Model (1) and the so-called expenditure index panel conducted by Infomart, a marketing research agency [11]. We restrict analysis to ξ₁ = ξ₂ = … = ξ₁₂ (annual average). The maximum likelihood estimate of ρ in Eq (1) for food is 0.76, with standard error 0.005 [11]. From [19], the survey cost was estimated to be roughly USD 513,000. The average cost of observing every individual in cross-section p₁ and the average cost of observing every individual in panels p₂ were estimated to be roughly USD 125 and USD 75, respectively. The following results are obtained using MATLAB.

OATD method

The benchmarking parameter combinations of the algorithm designed in this paper are set as follows: the inner iteration number B = 2000, the initial temperature E₀ = 10000, the temperature decrease rate m = 0.75 and the termination temperature ε = 0.0001. Subsequently, we analyze the effects of these parameters.

First, we set the inner iteration number B, the temperature decrease rate m, the termination temperature ε, and the initial temperature E₀ is changed from 1 to 1,000,000. The optimal proportion values and the corresponding objective values are shown in Fig 1. From Fig 2, the objective values fluctuate between 0.005 and 0.015 and, when the initial temperature is more than 100,000, the optimal proportion values and the corresponding objective values can converge to the optimal values of 0.2 and 0.005, respectively. Therefore, the initial temperature can be chosen between 100,000 and 1,000,000.

Download:

Fig 1. The effect of initial temperature on the optimal proportion value.

https://doi.org/10.1371/journal.pone.0154913.g001

Download:

Fig 2. The effect of initial temperature on the objective function value.

https://doi.org/10.1371/journal.pone.0154913.g002

Second, we set the initial temperature E₀, the temperature decrease rate m, the termination temperature ε, and the inner iteration number are changed from 500 to 3,000 to ensure that the algorithm reaches the balanced state. The optimal proportion values and the corresponding objective values are shown in Fig 3. From Fig 3, the higher the inner iteration number is, the more easily the algorithm moves from the local optimal value and converges to the global optimal value. Conversely, the higher the inner iteration number is, the longer the implementation time. In this study, the implementation time based on the set of parameters is in 10 minutes. As such, we do not consider that the inner iteration numbers increases implementation time. From Fig 4, when the inner iteration number is more than 2,000, the objective values and the optimal proportion values converge to 0.007 and 0.2, respectively. Therefore, the inner iteration number can be chosen between 2,000 and 2,500.

Download:

Fig 3. The effect of inner iteration number on the optimal proportion value.

https://doi.org/10.1371/journal.pone.0154913.g003

Download:

Fig 4. The effect of inner iteration number on the objective function value.

https://doi.org/10.1371/journal.pone.0154913.g004

Subsequently, we set the initial temperature E₀, the termination temperature ε, the inner iteration number B, and the temperature decrease rate m is changed from 0.45 to 0.9. The optimal proportion values and the corresponding objective values are shown in Fig 5. If the temperature decrease rate is less than 0.6, the algorithm falls into the local optimal value. When the temperature decrease rate is more than 0.6, the algorithm is above the local optimal value and converges to the global optimal value. As is shown in Figs 5 and 6, the optimal proportion values and the corresponding objective values converge to 0.2 and 0.0046, respectively. The temperature decrease rate determines the searching space and the larger the temperature decrease rate is, the greater the searching space. Consequently, we can choose the temperature decrease rate between 0.75 and 0.9.

Download:

Fig 5. The effect of temperature decrease on the optimal proportion value.

https://doi.org/10.1371/journal.pone.0154913.g005

Download:

Fig 6. The effect of temperature decrease on the objective function value.

https://doi.org/10.1371/journal.pone.0154913.g006

Finally, we set the initial temperature E₀, the temperature decrease rate m, the inner iteration number B, and the termination temperature ε is changed from 0.00001 to 1. Figs 7 and 8 show that when the termination temperature ε is near 0.0001, the optimal proportion values and the corresponding objective values converge to 0.2 and 0.0043, respectively. The lower the termination temperature is, the more adequate the time to converge to the optimal value. Therefore, the termination temperature can be chosen between 0.0001 and 0.00001.

Download:

Fig 7. The effect of termination temperature on the optimal proportion value.

https://doi.org/10.1371/journal.pone.0154913.g007

Download:

Fig 8. The effect of termination temperature on the objective function value.

https://doi.org/10.1371/journal.pone.0154913.g008

FD method

According to the above discussion, we separate the algorithm designed in this paper into two cases to analyze, the best and worst case. Under the two different cases, the fields of parameters based on the above OATD analysis can be separated into two levels (+ and -) as shown in Table 2.

Download:

Table 2. Parameters levels.

https://doi.org/10.1371/journal.pone.0154913.t002

For each case, the temperature decrease rate m, the inner iteration number B and the termination temperature ε are chosen randomly from their two parameter levels, and there are six values in all. Under each parameter combination, we take the optimal value 10 times the average value. The initial temperature is considered constant as compared to the whole field, since the local optimal value because of the initial temperature is very small, and the resultant errors from the initial temperature are smaller than that from other parameters. Comparing to other parameters, the initial temperature has a low quantitative influence on the algorithm designed in this paper, and will not be analyzed as such.

It is found as per Table 3 that the effects of the temperature decrease rate m and the inner iteration number B are clear, meaning that the choice of the two parameters decides whether the algorithm can obtain the optimal value. In different cases, the temperature decrease rate and the inner iteration number have different effects on the optimal value. In the best case, the inner iteration number is large enough to guarantee the solutions are stable and, if it continues to increase, the solutions do not improve much. In this case, the adjustment of temperature decrease rate can continue to narrow the neighborhood range for more convergent results. Therefore, the temperature decrease rate influences the best case. In the worst case, the inner iteration number has a significant impact, since the lower inner iteration number causes a wider searching range, which makes the process of searching far from the optimal solution. This time, the result is easier to move to the local optimal solution with the adjustment of the temperature decrease. Furthermore, Table 3 shows that the combination effect of the temperature decrease rate and the inner iteration number is also clear.

Download:

Table 3. Index effects on numerical results.

https://doi.org/10.1371/journal.pone.0154913.t003

Conclusions

In this paper, we discuss how to determine the optimal proportion of a series of cross-sections in all samples to minimize survey design efficiency in the analysis-of-variance model, which can be applied directly in sampling. First, we derive a theorem for choosing the optimal proportion of a series of cross-sections in all samples, irrespective of the parameters of interest and budget. In addition, our results show that, compared to a pure series of cross-sections or pure panel, the gains from choosing split panel can be substantial. Second, the efficiency of split panel design given a budget is considered and an efficient algorithm is designed to solve the constrained nonlinear integer optimization associated with the efficiency of survey designs on a budget. We further apply OATD and FD methods to analyze and compare the quantitative influence of different selections of parameters in the implementation of the algorithm with an empirical example concerning monthly consumer expenditure on food in 1985, in the Netherlands, and obtain the efficient ranges of the algorithm parameters to ensure a good solution.

For further research, we will extend the results to a more general analysis of the covariance model and derive the expressions for the variances of efficient parameter estimators. At the same time, other algorithms can be to solve the new nonlinear programming from the optimal split design in the analysis of the covariance model.

Appendix 1

Appendix 1A. Proof of constructing the orthogonal matrix Q

Proof: (51)

Therefore, matrix Q is an orthogonal matrix.

is the n-repeated eigenvalue of real symmetric matrix , so (52)

Let ς^T denote the nonzero row vector of , so and ς^TX = 0 have the same solutions. Let B₁ denote the matrix removing the first column of matrix Q, (53) and (54)

Subsequently, (55) (56)

As such, the bottom n−1 column vectors of matrix Q are the solutions of ς^TX = 0. Matrix Q is an orthogonal matrix and it is impossible for all the column vectors of matrix Q to be zero, therefore, the bottom n−1 column vectors of matrix Q are eigenvectors belonging to eigenvalue of matrix . The n column vectors of the orthogonal matrix constitute a unit orthogonal vector group, and the bottom n−1 column vectors are unit orthogonal vector group constituted of n−1 eigenvectors belonging to eigenvalue of matrix . The first column vector of matrix Q is a unit eigenvector belonging to eigenvalue of matrix . Moreover, different eigenvectors belonging to different eigenvalues are orthogonal, and the unit vector that is orthogonal with the bottom n−1 column vectors of matrix Q is unit eigenvector belonging to eigenvalue of matrix . Additionally, matrix Q is an orthogonal matrix, and the first column vector of matrix Q is the unit eigenvector belonging to eigenvalue of matrix .

Appendix 1B. Proof of theorem 1

Proof:

Let (57) (58) and (59)

Subsequently, (60)

Therefore, (61)

Let (62) thus (63) when λ∈[0,1],.

In detail, (64)

Let (65) (66) (67)

Subsequently, (68)

Let Δ_k = b²−4ac, and as a result (69)

From the above results,

When λ∈[0,1], c<0 and k(1)<0, M(1) is the minimum value of M(λ).

When λ∈[0,1] and c>0, M(0) is the minimum value of M(λ).

When λ∈[0,1], a<0, c<0, and k(1)>0, M(k_r) is the minimum value of M(λ), where k_r is the right root of k(λ).

When λ∈[0,1], a>0, c<0, and k(1)>0, M(k_l) is the minimum value of M(λ), where k_l is the left root of k(λ).

When a = 0, and then c<0 and b<0, is the minimum value of M(λ).

Appendix 1C. Detailed design process of proposed algorithm

Simulated annealing begins with an initial solution, and then randomly generates a neighboring solution or by using a pre-specified rule. It is the process when a state moves from the initial solution to a candidate solution in which the energy is minimized based on the Metropolis acceptance criterion. As such, we can accept the candidate solution based on the acceptance probability. We consider a number of conditions, and, subsequently, the steps of the proposed algorithm are shown.

In order to converge to the global optimal solution, we choose (70) as the acceptance criterion, which is called Metropolis criterion, where n is a random number distributed uniformly over [0,1] and the state acceptance function is denoted by (71) f is the objective function of (P2); X^K and Y^K are the current iteration point and the new iteration point, respectively; T_K is the k time algorithm stage temperature, which is obtained from the cooling schedule presented in Eq (50); β is a positive constant. In order to guarantee the integer optimal solution, the new iteration point Y^K is generated by the following process: (72) where (73) and is the ith component of the random vector Z^K; U₁,U₂,…,U_n is a group of random variables distributed uniformly over [−1,1], which are independent of each other; sign(⋅) is the sign function; 〈⋅〉 shows the symbol of rounding numbers.

The performance and convergence of simulated annealing are crucial, and are affected by the cooling schedule. If T decreases fast, a fast convergence can be obtained. However, the simulated annealing reaches the global optimal solution with difficulty if T decreases fast. Therefore, the following cooling schedule guarantees the global optimal solution by the above theory: (74) where T₀ is an initial temperature; m≥1 and m is an integer constant, which determines the speed of the decreasing temperature. The choice of T₀ may be crucial, the sophisticated techniques being discussed by Van Laarhoven and Aarts [12].

In conclusion, the stopping criterion for the simulated annealing algorithm is given by (75) where ε denotes any small number.

Acknowledgments

I would like to extend my sincere gratitude to John Knight, Yan Xu and Ming Huang for their instructive advice and useful suggestions for my paper.

Author Contributions

Conceived and designed the experiments: XL. Performed the experiments: XL. Analyzed the data: XL. Wrote the paper: XL. Contributed to update the language: WW HL.

References

1. Binder D.A., Hidiroglou M.A.. Sampling in time. Krishnaiah P.R. and Rao C.R., eds. 1988.
2. Kish L.. Data collection for details over space and time. Statistical method and the improvement of data quality, 1983; 28(2):73–84.
- View Article
- Google Scholar
3. Kish L.. Timing of surveys for public policy. Austral. J. Statist. 1986; 28(1):1–12.
- View Article
- Google Scholar
4. Rodrı´guez-Oreggia E.. Hurricanes and labor market outcomes: Evidence for Mexico. Global environmental change. 2013; 23(2):351–359.
- View Article
- Google Scholar
5. James B.T. The effects of questionnaire mode on response in a federal employee survey: mail versus electronic mail. Treat special contributed paper at the American statistical association annual conference.1997.
6. Dhaene G, Jochmans K.. Split-panel jackknife estimation of fixed-effect models, working paper. Sciences Po departement of economics. 2014.
- View Article
- Google Scholar
7. Veiga A., Peter W.F.S., James J.B.. The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey. Journal of the royal statistical society. 2014; 63(1):65–84.
- View Article
- Google Scholar
8. Patterson H.D.. Sampling on successive occasions with partial replacement of units. Journal of the royal statistical society. 1950; 12(1):241–255.
- View Article
- Google Scholar
9. Eckler A.R.. Rotation sampling. Annals of mathematical statistics. 1955; 26(2):664–685.
- View Article
- Google Scholar
10. Cochran W. Sampling techniques. New York: Wiley. 1977.
11. Nijman T.H.E., Verbeek M.. Estimation of time dependent parameters in linear models using cross-sections, panels or both [M]. Journal of Economics 1990, 46(1): 333–346.
- View Article
- Google Scholar
12. Van Laarhoven P. J. M., Aarts E. H. L. Simulated annealing: theory and applications. Netherlands: Kluwer. 1987.
13. Campbell J.. Photosynthetic control of atmospheric carbonyl sulfide during the growing season. Science. 2008; 322(1):1085–1088.
- View Article
- Google Scholar
14. Bailis R., Ezzati M., Kammen D.. Mortality and greenhouse gas impacts of biomass and petroleum energy futures in Africa. Science. 2005; 308(2):98–103.
- View Article
- Google Scholar
15. Murphy J. Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature. 2004; 430(1):768–772.
- View Article
- Google Scholar
16. Saltelli A., Chan K., Scott E.M.. Sensitivity analysis. New York. 2000.
17. Saltelli A., Annoni P.. How to avoid a perfunctory sensitivity analysis. Environmental modeling and software. 2010; 25(1):1508–1517.
- View Article
- Google Scholar
18. Nijman T.H.E., Verbeek M., Soest A.V.. The efficiency of rotating-panel designs in an analysis of variance model. Journal of econometrics. 1991; 49(1):373–399.
- View Article
- Google Scholar
19. Duncan G.J., Juster F.T., Morgan J.N.. The role of panel studies in research on economic behavior. Transportation research. 1987; 21(2):249–263.
- View Article
- Google Scholar

[ref1] 1. Binder D.A., Hidiroglou M.A.. Sampling in time. Krishnaiah P.R. and Rao C.R., eds. 1988.

[ref2] 2. Kish L.. Data collection for details over space and time. Statistical method and the improvement of data quality, 1983; 28(2):73–84.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Kish L.. Timing of surveys for public policy. Austral. J. Statist. 1986; 28(1):1–12.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Rodrı´guez-Oreggia E.. Hurricanes and labor market outcomes: Evidence for Mexico. Global environmental change. 2013; 23(2):351–359.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. James B.T. The effects of questionnaire mode on response in a federal employee survey: mail versus electronic mail. Treat special contributed paper at the American statistical association annual conference.1997.

[ref6] 6. Dhaene G, Jochmans K.. Split-panel jackknife estimation of fixed-effect models, working paper. Sciences Po departement of economics. 2014.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Veiga A., Peter W.F.S., James J.B.. The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey. Journal of the royal statistical society. 2014; 63(1):65–84.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Patterson H.D.. Sampling on successive occasions with partial replacement of units. Journal of the royal statistical society. 1950; 12(1):241–255.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Eckler A.R.. Rotation sampling. Annals of mathematical statistics. 1955; 26(2):664–685.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref10] 10. Cochran W. Sampling techniques. New York: Wiley. 1977.

[ref11] 11. Nijman T.H.E., Verbeek M.. Estimation of time dependent parameters in linear models using cross-sections, panels or both [M]. Journal of Economics 1990, 46(1): 333–346.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Van Laarhoven P. J. M., Aarts E. H. L. Simulated annealing: theory and applications. Netherlands: Kluwer. 1987.

[ref13] 13. Campbell J.. Photosynthetic control of atmospheric carbonyl sulfide during the growing season. Science. 2008; 322(1):1085–1088.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref14] 14. Bailis R., Ezzati M., Kammen D.. Mortality and greenhouse gas impacts of biomass and petroleum energy futures in Africa. Science. 2005; 308(2):98–103.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref15] 15. Murphy J. Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature. 2004; 430(1):768–772.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref16] 16. Saltelli A., Chan K., Scott E.M.. Sensitivity analysis. New York. 2000.

[ref17] 17. Saltelli A., Annoni P.. How to avoid a perfunctory sensitivity analysis. Environmental modeling and software. 2010; 25(1):1508–1517.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref18] 18. Nijman T.H.E., Verbeek M., Soest A.V.. The efficiency of rotating-panel designs in an analysis of variance model. Journal of econometrics. 1991; 49(1):373–399.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref19] 19. Duncan G.J., Juster F.T., Morgan J.N.. The role of panel studies in research on economic behavior. Transportation research. 1987; 21(2):249–263.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Theoretical results of parameter estimators variances

Split panel design efficiency irrespective of the parameters of interest and budget

Split panel design efficiency with budget constraint and main algorithm

Algorithm design.

Results and Discussion

OATD method

FD method

Conclusions

Appendix 1

Appendix 1A. Proof of constructing the orthogonal matrix Q

Appendix 1B. Proof of theorem 1

Appendix 1C. Detailed design process of proposed algorithm

Acknowledgments

Author Contributions

References